26.3 lambda expressions

59
Visual Studio 2005 Technical Articles Next-Generation Data Access: Making the Conceptual Level Real José Blakeley, David Campbell, Jim Gray, S. Muralidhar, Anil Nori June 2006 Applies to: ADO.NET .NET Language Integrated Query (LINQ) SQL Server Summary: Eliminate the impedance mismatch for both applications and data services like reporting, analysis, and replication offered as part of the SQL Server product by raising the level of abstraction from the logical (relational) level to the conceptual (entity) level. (31 printed pages) Contents Abstract Introduction Impedance Mismatch Database Modeling Layers Data Services Evolution The Vision Making the Conceptual Level Real Entity Framework Architecture References Abstract Significant technology and industry trends have fundamentally changed the way that applications are being built. Line of business (LOB) applications that were constructed as monoliths around a relational database system 10-20 years ago must now connect with other systems and produce and consume data from a variety of disparate sources. Business processes have moved from semi-automated to autonomous. Service oriented architectures (SOA) introduce new consistency and coordination requirements. Higher level data services, such as reporting, data mining, analysis, synchronization, and complex integration have moved from esoteric to mainstream. A common theme throughout all modern application architectures is a need to transform data from one form to another to have it in the right form for the task at hand. Today's applications sport a number of data transformers. A common transformation usually encapsulated as a proprietary data access layer inside applications is designed to minimize the impedance mismatch between application objects and relational rows. However, other mappings to navigate object-xml, and relational-xml exist. This impedance mismatch is not unique to applications. As SQL Server has evolved as a product, it has had to add a number of these modeling and mapping mechanisms across the services it provides within the product. Most of these mappings are produced in a point-to-point fashion and each requires a different means to describe the point-to-point transformation. A fundamental insight is that most traditional data centric services such as query, replication, ETL, have been implemented at the logical schema level. However, the vast majority of new data centric services best operate on artifacts typically associated with a conceptual data model. The essence of our data platform vision is to elevate Microsoft's data services, across several products, from their respective logical schema levels to the conceptual schema level. Reifying the conceptual schema layer allows us to create services around common abstractions and share tooling, definition, and models across the majority of our data services. We will demonstrate in this paper how this shift will profoundly impact our ability to provide value across our entire application platform. This paper focuses on programming against data and how by raising the level of abstraction from the logical (relational) level to the conceptual (entity) level we can eliminate the impedance mismatch for both applications and data services like reporting, analysis, and replication offered as part of the SQL Server product. The conceptual data model is made real by the creation of an extended relational model, called the entity data model (EDM), that embraces entities and relationships as first class concepts, a query language © iTech Connect K. Feroz 1

Upload: duongliem

Post on 02-Jan-2017

227 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 26.3 Lambda Expressions

Visual Studio 2005 Technical ArticlesNext-Generation Data Access: Making the Conceptual Level Real 

José Blakeley, David Campbell, Jim Gray, S. Muralidhar, Anil Nori

June 2006

Applies to:   ADO.NET   .NET Language Integrated Query (LINQ)   SQL Server

Summary: Eliminate the impedance mismatch for both applications and data services like reporting, analysis, and replication offered as part of the SQL Server product by raising the level of abstraction from the logical (relational) level to the conceptual (entity) level. (31 printed pages)

Contents

AbstractIntroductionImpedance MismatchDatabase Modeling LayersData Services EvolutionThe VisionMaking the Conceptual Level RealEntity Framework ArchitectureReferences

AbstractSignificant technology and industry trends have fundamentally changed the way that applications are being built. Line of business (LOB) applications that were constructed as monoliths around a relational database system 10-20 years ago must now connect with other systems and produce and consume data from a variety of disparate sources. Business processes have moved from semi-automated to autonomous. Service oriented architectures (SOA) introduce new consistency and coordination requirements. Higher level data services, such as reporting, data mining, analysis, synchronization, and complex integration have moved from esoteric to mainstream.

A common theme throughout all modern application architectures is a need to transform data from one form to another to have it in the right form for the task at hand. Today's applications sport a number of data transformers. A common transformation usually encapsulated as a proprietary data access layer inside applications is designed to minimize the impedance mismatch between application objects and relational rows. However, other mappings to navigate object-xml, and relational-xml exist. This impedance mismatch is not unique to applications. As SQL Server has evolved as a product, it has had to add a number of these modeling and mapping mechanisms across the services it provides within the product. Most of these mappings are produced in a point-to-point fashion and each requires a different means to describe the point-to-point transformation.

A fundamental insight is that most traditional data centric services such as query, replication, ETL, have been implemented at the logical schema level. However, the vast majority of new data centric services best operate on artifacts typically associated with a conceptual data model. The essence of our data platform vision is to elevate Microsoft's data services, across several products, from their respective logical schema levels to the conceptual schema level. Reifying the conceptual schema layer allows us to create services around common abstractions and share tooling, definition, and models across the majority of our data services. We will demonstrate in this paper how this shift will profoundly impact our ability to provide value across our entire application platform.

This paper focuses on programming against data and how by raising the level of abstraction from the logical (relational) level to the conceptual (entity) level we can eliminate the impedance mismatch for both applications and data services like reporting, analysis, and replication offered as part of the SQL Server product. The conceptual data model is made real by the creation of an extended relational model, called the entity data model (EDM), that embraces entities and relationships as first class concepts, a query language for the EDM, a comprehensive mapping engine that translates from the conceptual to the logical (relational) level, and a set of model-driven tools that help create entity-object, object-xml, entity-xml transformers. Collectively, all these services are called the Entity Framework. ADO.NET, the Entity Framework, and .NET Language Integrated Query (LINQ) innovations in C# and Visual Basic represent a next-generation data access platform for Microsoft.

IntroductionThe Microsoft Data Access vision supports a family of products and services so customers derive value from all data, birth through archival. While the vision statement does not include explicit verbiage, the goal of the vision is to provide products and services for data across all tiers of the application (solution). Such a complete data platform must have the following characteristics:

Data in All Tiers. A complete data platform provides data management and data access services everywhere. In the client-server world, "everywhere" includes data services on the client and the data server; in the enterprise world, "everywhere"

© iTech Connect K. Feroz 1

Page 2: 26.3 Lambda Expressions

includes the data server tier, the app server (mid) tier, and the client tier; the mobile world includes the mobile device tier; and the next generation web (cloud) world includes data in the shared web space.

All types of Data. Increasingly, applications incorporate a variety of data—e.g. XML, email, calendar, files, documents, and structured business data. The Microsoft Data Access vision supports an integrated store vision that can store and manage all of this data, secure it, search and query it, analyze it, share it, synchronize it, etc. Such an integrated store includes the core data management capabilities and a platform for application development.

Uniform Data Access. While applications in different tiers require different kinds of data management services, they all expect (require) significant uniformity in application development environment (programming models and tools). Often, the same application may be deployed across multiple tiers (e.g. on the devices and on the desktops) and it is highly desirable to develop once and deploy on different tiers. In addition, as application scale needs increase and the application moves up tiers, it must be possible to upsize the database (e.g. from SQL Everywhere to SQL Express to SQL Server), without requiring (significant) application changes. Support for uniform application development requires: Rich Data Modeling to match the required abstractions for application data, Rich and Consistent Programming environment, and Tools for all data.

End-to-End Business Insight. End-to-end business insight is all about enabling better decision making. It is about the technology that can enable our customers to collect, clean, store and prepare their business data for the decision making process. It is also about the experiences that the Business Users and Information Workers will have when accessing, analyzing, visualizing and reporting on the data while gathering the information necessary for their decisions.

Ubiquitous Data Services. Applications invest significant effort in (custom) development of services like data security, synchronization, serialization for data exchange (or for web services), analytics, and reporting over data in all the tiers, over abstractions that are close to the applications' perspective. The Microsoft Data Access vision expects offering such services, in a uniform manner, across data in all tiers.

Rich "Abilities". Data is a key asset, whether it is an enterprise, corporate, or a home user. Customers want their data to be always available; it must be secured; the access must be performant; their applications must scale and supportable. The Microsoft Data Access vision presumes rich "abilities" on all data. In addition, it provides ease of management of all data, thereby significantly minimizing the TCO of the data.

Impedance MismatchA key issue addressed by the next-generation data access platform is the well-known application impedance mismatch problem. Consider the way in which data access applications are written today. Data access code has not changed significantly in the last 10-15 years. The data access patterns introduced in the ODBC are still present in OLE-DB, JDBC, and ADO.NET. Here is an example of ADO.NET today, similar examples can be written in other APIs.

class DataAccess{

static void GetNewOrders(DateTime date, int qty) { using (SqlConnection con = new SqlConnection(Settings.Default.NWDB)) { con.Open();

SqlCommand cmd = con.CreateCommand(); cmd.CommandText = @" SELECT o.OrderDate, o.OrderID, SUM(d.Quantity) as Total FROM Orders AS o LEFT JOIN [Order Details] AS d ON o.OrderID = d.OrderID WHERE o.OrderDate >= @date GROUP BY o.OrderID HAVING Total >= 1000"; cmd.Parameters.AddWithValue("@date", date);

DbDataReader r = cmd.ExecuteReader(); while (r.Read()) { Console.WriteLine("{0:d}:\t{1}:\t{2}", r["OrderDate"], r["OrderID"], r["Total"]); } } }}

This code pesents several inconveniences to the developer. The query is expressed by text strings opaque to the programming language. The query includes a left-outer join needed to assemble rows from the normalized orders and the order details tables and is not directly related to the business request. Results are returned in untyped data records. A more elegant code that leverages the Entity Framework in ADO.NET as well as the language integration innovations in .NET Language Integrated Query (LINQ) would be:

class DataAccess

© iTech Connect K. Feroz 2

Page 3: 26.3 Lambda Expressions

{

static void GetNewOrders(DateTime date, int qty) { using (NorthWindDB nw = new NorthWindDB ()) {

var orders = from o in nw.Orders where o.OrderDate > date select new { o.orderID, o.OrderDate, Total = o.OrderLines.Sum(l => l.Quantity);

foreach (SalesOrder o in orders) { Console.WriteLine("{0:d}\t{1}\t{2}", o.OrderDate, o.OrderId, o.Total); } } }}

While .NET applications like to access data as CLR objects, most data services like reporting and replication prefer to access data as entity values. Such data services would formulate dynamic queries over entities as follows:

MapCommand cmd = con.CreateCommand(); cmd.CommandText = @" SELECT o.OrderDate, o.OrderId, o.Total FROM (SELECT o.OrderId, SUM(o.OrderDetails..Quantity) AS Total,           FROM Orders AS o           WHERE o.OrderDate >= @date) AS o WHERE o.Total >= 1000"; cmd.Parameters.AddWithValue("@date", date);

There is a strong need for applications and data services to manipulate data at the conceptual level and to have mechanisms built into their development frameworks to easily transform these concepts into objects that are more natural for the application to reason with.

While the innovations introduced by the Entity Framework in ADO.NET and LINQ bring an exciting new world to application developers by solving the long-standing impedance mismatch problem our responsibility as platform vendor is to introduce these innovations in an evolutionary manner that preserves existing code investments in ADO.NET 2.0. Specifically, we will ensure that no rewriting of existing applications is needed to leverage some of the benefits of the Entity Framework such as rich conceptual model, data independence of ADO.NET code from logical (relational) schema, and an entity query language firmly rooted in SQL. We will ensure ADO.NET applications can opportunistically adopt capabilities of the Entity Framework when appropriate. An important design goal of the Entity Framework is to provide fine-grained control and extensibility to the developer. This means we do not provide an 80% solution that hits a brick wall and instead allow applications written against the Entity Framework to reach lower level ADO.NET components when needed. The integration of LINQ on top of entities allows ADO.NET developers to enjoy the innovations of LINQ and the powerful relational-to-entity mapping capabilities offered by the Entity Framework.

Database Modeling LayersIn order to describe the higher level modeling capabilities of the next-generation data platform we need to introduce some basic database design concepts. Today's dominant information modeling methodology for producing database designs factors an information model into four main levels: physical, logical (relational), conceptual, and programming or presentation levels described.

Physical Model LevelThe physical model describes how data is represented in physical resources such as memory, wire or disk. The concepts discussed in this layer include record formats, file partitions and groups, heaps, and indexes. The physical model is described in this document primarily for completeness since it is typically invisible to the application. Changes to the physical model usually do not impact the application logic, and manifest themselves only in the way the application performs. Applications usually target the logical or relational data model described next.

Logical Model LevelA logical data model is a complete and precise information model of the target domain. The relational model, due to its dominance, is typically the target representation of a logical data model. The concepts discussed at the logical level include tables, rows, and primary key-foreign key constraints. This is the layer where database normalization concepts (i.e., vertical partitioning of data) are used. In fact, it is not uncommon to find relational schemas that display a high degree of normalization. The logical model level has been the target of relational database applications for the last 20 years. The mapping between the logical and physical levels is entirely outside the realm of the application and is performed implicitly by the relational database system. The ability to isolate applications written against the logical level from changes at the physical level &150; such as adding

© iTech Connect K. Feroz 3

Page 4: 26.3 Lambda Expressions

or dropping indexes and physical reorganization of records—is referred to as data independence and is considered one of the main benefits of the relational model. In a relational database system, the SQL query language operates over tables and views but does not (typically) have to know about the various indexes available at the physical layer.

The sometimes high degree of normalization found in relational schemas today helps to satisfy important application requirements such as data consistency and increased concurrency with respect to updates and OLTP performance. However, normalization introduces challenges to applications in that data at the logical level is too fragmented and the application logic needs to aggregate rows from multiple tables into higher level entities that more closely resemble the artifacts of the application domain. The conceptual level introduced in the next section is designed to overcome these challenges.

Conceptual Model LevelThe conceptual model captures the core information entities from the problem domain and their relationships. A well-known conceptual model is the Entity-Relationship Model introduced by Peter Chen in 1976 [CHEN76]. UML is a more recent example of a conceptual model [UML].

Most significant applications involve a conceptual design phase early in the application development lifecycle. Today, many people interpret "conceptual" as "abstract" because the conceptual data model is captured inside a database design tool that has no connection with the code and the logical relational schema used to implement the application. The database design diagrams usually stay "pinned to a wall" growing increasingly disjoint from the reality of the application implementation with time. However, a conceptual data model can be as real, precise, and focused on the concrete "concepts" of the application domain as a logical relational model. There is no reason why a conceptual model could not be embodied concretely by a database system. A goal of the Microsoft Data Access vision is to make the conceptual data model a concrete feature of the data platform.

Most people associate the task of transforming a conceptual model into a logical model as Normal Form transformations into a logical relational model. This doesn't necessarily need to be the case as we'll show later. Just like relational systems provide data independence between the logical and physical levels, a system implementing a conceptual model can provide data independence between the conceptual and logical levels. The isolation of applications targeting the conceptual level from logical level changes is highly desirable. In a recent survey conducted by the Microsoft Developer Division the need to isolate applications from changes to the (relational) logical level was one of the top ranked feature requests.

The conceptual model in the data platform is embodied by the Entity Data Model (EDM) [EDM]. The central concepts in the EDM are entities and relationships. Entities are instances of Entity Types (e.g., Customer, Employee) which are richly structured records with a key. An entity key is formed from a subset of properties of the Entity Type. The key (e.g., CustId, EmpId) is a fundamental concept to uniquely identify and update entity instances and to allow entity instances to participate in relationships. Entities are grouped in Entity Sets (i.e., Customers is a set of Customer instances). Relationships relate entity instances and are instances of Relationship Types (e.g., Employee WorksFor Department). Relationships are grouped in Relationship Sets.

The EDM works in conjunction with the eSQL query language [ESQL], which is an evolution of SQL designed to enable set-oriented, declarative queries and updates over entities and relationships. EDM can work with other query languages as well. EDM and eSQL together represent a conceptual data model and query language for the data platform and have been designed to enable business applications such as CRM and ERP, data services such as reporting, analysis, and synchronization, and applications to model and manipulate data at a level of abstraction and semantics closer to their needs. The EDM is a value-based conceptual model. It does not incorporate behaviors of any kind. It is also the basis for the programming/presentation level described in the next section.

Programming and Presentation LevelThe entities and relationships of the conceptual model usually need to be manifested in different forms based on the task at hand. Some entities need to be transformed into objects amenable to the programming language implementing the application business logic. Some entities need to be transformed into XML streams as a serialization format for web services invocations. Other entities need to be transformed into in-memory structures such as lists, dictionaries, or data tables for the purposes of UI data binding. Naturally, there is no universal programming model or presentation form; thus applications need flexible mechanisms to transform entities into the various presentation forms.

Often, proponents of a particular presentation or programming model will argue that their particular "presentation" view is the one truth. We believe there is no "one proper presentation model"; and that the real value is in making the conceptual level real and then being able to use that model as the basis for mapping to and from various presentation models and other higher level services. Most developers, and most of our modern services as we will point out later, want to reason about high-level concepts such as an "Order" (See Figure 1) not about the several tables that it is normalized over in a relational database schema. They want to query, secure, program, report on the order. Most developers implicitly think about the order when they design their application. It may be in their head, a UML diagram, or their whiteboard. An order may manifest itself at the presentation/programming level as a class instance in Visual Basic or C# encapsulating the state and logic associated with the order, or as an XML stream for communicating with a web service.

© iTech Connect K. Feroz 4

Page 5: 26.3 Lambda Expressions

Figure 1. Physical, logical, conceptual and multiple programming and presentation views of an Order. (Click on the image for a larger picture)

Data Services EvolutionThis section describes the platform shift that motivates the need for a higher level data model and data platform. We will look at this through two perspectives: application evolution and SQL Server's evolution as a product. The key point here is that the impedance mismatch problem illustrated in Section 2 is not unique to applications, but it is also a challenge in building higher-level data services such as reporting and replication.

Application Evolution Data-based applications 10-20 years ago were typically structured as data monoliths; closed systems with logic factored by verb-object functions that interacted with a database system at the logical schema level. Let's take an order entry system as an example.

Order Entry Circa 1985

A typical order entry system built around a relational database management system (RDBMS) 20 years ago would have logic partitioned around verb-object functions associated with how users interacted with the system. In fact, the user interaction model via "screens" or "forms" became the primary factoring for logic—there would be a new-order screen, and update-customer screen. The system may have also supported batch updates of SKU's, inventory, etc. See Figure 2.

The key point is that the application logic was tightly bound to the logical relational schema. The new-order screen would reference an existing customer, assemble the order information from products, orders and order details. Submission of the screen would begin a transaction and invoke logic to insert a new order in the "orders" table containing date, order status, customer ID, ship-to, bill-to, etc. The new-order routine would also insert a number of line items into the order-details table. Many first generation RDBMS did not support foreign key constraint checking at the database level so the application logic had to ensure that every new order had a valid customer ID or that deleting of an order-header row required deleting all associated order-details. The batch processes to update master data such as SKU and pricing information also worked at the logical schema level. People typically wrote batch programs to interact directly with the logical schema to perform these updates. One other important point to make is that, 20 years ago, virtually all of the database services (insert, delete, query, bulk load) were also built at the logical schema level. Programming languages did not support representation of high-level abstractions directly—objects did not exist.

© iTech Connect K. Feroz 5

Page 6: 26.3 Lambda Expressions

Figure 2. Order Entry System circa 1985 (Click on the image for a larger picture)

These applications can be characterized as being closed systems whose logical data consistency was maintained by application logic implemented at the logical schema level. An order was an order because the new-order logic ensured that it was.

Order Entry Circa 2005

Several significant trends have shaped the way that modern data-based applications are factored and deployed. Chief among these are object oriented factoring, service level application composition, and higher level data services.

Object Oriented Factoring

Object oriented methodologies replaced verb-object functional factoring by associating logic with "Business Objects". Business objects typically follow the conceptual "entity" factoring that would have been created as part of a conceptual schema model. Entities such as "Customers", "Orders", "ProductMaster" become business objects. Relationships between business object entities, captured during conceptual schema design, are implemented as methods on business objects. Thus, the "Customer" business object has a method called "GetOpenOrders" which returns a collection of that customer's open orders. With the advent of business objects, an additional abstraction layer was added in the application and, rather than having free variables collected in a new-order screen be sent to the database as columns in rows, the business object maintained the state of the customer entity which could be retrieved or persisted to the database implicitly or explicitly.

Service Level Application Composition

An order entry application 20 years ago was typically implemented as a single monolith around an RDBMS. Reference and resource data, such as pricing, product master, and true inventory were generally updated at the logical schema level via batch jobs that extracted the data from other systems into flat files and then loaded them into the order entry system via bulk load utilities at the logical schema level.

Today's order entry system is more likely composed of services provided by other systems. These services are invoked on an as needed basis. Furthermore, the order entry system must respond to service requests and events initiated by other systems. Let's look at inventory management. Instead of getting a batch update of inventory levels from another system on a daily basis, it is more likely that an order entry system will send a service request to a distribution system to check on inventory levels. Or, an order management system might simply receive stock level notifications, such as "In Stock" or "Backordered" from the distribution system. These stock level notifications might be fairly stateful such as an "inventory low" notification that lets the order management system know that there are only 3 days of inventory available at current forecast sales. Or a "Backordered" stock notification could indicate when the next shipment is due to arrive at the distribution facility. A key point is that "StockNotification" is a conceptual entity. In our order entry system, a "StockNotification" may be resident in memory as a business object; stored in a durable queue between systems; saved in a database for subsequent analytics to evaluate supplier performance; or serialized as XML as a web service "noun", etc. In all of these forms "StockNotification" still contains the same conceptual structure; however, its logical schema; physical representation, and what services can be bound to its current representation depend on the current use context. What this means is that a data centric view of these conceptual entities offers an opportunity to abstract and provide additional platform services over them. In essence, the data representing the state of a concept in a modern application needs to be mapped into various representations and bound to various services throughout the application.

View of Today's Order Entry System

When we think about the factoring, composition, and services from above, we can see that the conceptual entities are an important part of today's applications. It is also easy to see how these entities must be mapped to a variety of representations and bound to a variety of services. There is no one correct representation or service binding. XML, Relational and Object © iTech Connect K. Feroz 6

Page 7: 26.3 Lambda Expressions

representations are all important but no single one will suffice. When we first design the system, we think about "StockNotifications". How do we make them real and use our conceptual understanding of them throughout the system whether they are stored in a multi-dimensional database for analytics, in a durable queue between systems, in a mid-tier cache; a business object, etc.?

Figure 3 captures the essence of this issue by focusing on several entities in our order entry system. Note that conceptual level entities have become real. Also note that the conceptual entities are communicating with and mapping to various logical schema formats, e.g. relational for the persistent storage, messages for the durable message queue on the "Submit Order" service, and perhaps XML for the Stock Update and Order Status web services.

Figure 3. Order Entry System circa 2005 (Click on the image for a larger picture)

SQL Server EvolutionThe data services provided by the "data platform" 20 years ago were minimal and focused around the logical schema in an RDBMS. These services included query & update, atomic transactions, and bulk operations such as backup and load/extract.

SQL Server itself is evolving from a traditional RDBMS to a complete data platform that provides a number of high value data services over entities realized at the conceptual schema level. While providing services such as reporting, analysis, and data integration in a single product and realizing synergy amongst them was a conscious business strategy, the means to achieve these services and the resultant ways of describing the entities they operate over happened more organically—many times in response to problems recognized in trying to provide higher level data services over the logical schema level.

There are two great examples of the need for concrete entity representation for services now provided within SQL Server: logical records for merge replication, and the semantic model for report builder.

Early versions of merge replication in SQL Server provided for multi-master replication of individual rows. In this early mode, rows can be updated independently by multiple agents; changes can conflict; and various conflict resolution mechanisms are provided with the model. This row-centric service had a fundamental flaw—it did not capture the fact that there is an implicit consistency guarantee around entities as they flow between systems. In a relational database system the ACID guarantees of the concurrency control system are what prevent chaos. If an order consists of a single order-header and 5 order details, isolation provided by the concurrency control system prevents other agents on the system from seeing or processing the order in an inconsistent state when say only 3 of the 5 order details have been inserted. In the midst of a transaction, the database is often in a logically inconsistent state—for example when only 3 of the 5 order details are inserted or money has been debited from one account before being credited to another in a transfer. Other agents in the system only see the consistent "before state" or "after state". Since merge replication operated at the logical schema or row level it could not capture that the data representing a new order, while inserted in a single isolated transaction on one system, was to be installed in an isolated fashion when replicated. So, if the replication transport failed after transferring 3 out of 5 order details, the order processing system picking up and processing the inconsistent new order had no way of knowing that it was not yet complete. To address this flaw, the replication service introduced "logical records" as a way to describe and define consistency boundaries across entities comprised of multiple related rows at the logical schema level. "Logical records" are defined in the part of the SQL catalog associated with merge replication. There is not a proper design-time tool experience to define a "logical record" such as an Order that includes its Order Details—applications do it through a series of stored procedure invocations.

Report Builder (RB) is another example of SQL Server providing a data service at the conceptual entity level. SQL Server Reporting Services (SSRS) has added incredible value to the SQL product. Since it operates at the logical schema level though, writing reports requires knowing how to compose queries at the logical schema level—e.g. creating an order status report requires knowing how to write the join across the several tables that make up an order. Once SSRS was released, there was incredible demand for an end user report authoring environment that didn't require a developer to fire up Visual Studio and author SQL queries for reports. End users and analysts want to write reports directly over Customers, Orders, Sales, etc. They are business people who think at the business concept, or "domain", level and want to express their queries at this level rather than at the logical schema level. Virtually all "end user" and "English query" reporting environments require this. Thus, the SQL Server team created a means to describe and map conceptual entities to the logical schema layer we call the Semantic Model Definition Language (SMDL).

© iTech Connect K. Feroz 7

Page 8: 26.3 Lambda Expressions

These are just two of a number of mapping services provided within SQL Server—the Unified Dimensional Model (UDM) provides a multi-dimensional view abstraction over several logical data models. A Data Source View (DSV), on which the BI tools work, also provides conceptual view mapping technology.

The key point is that one key driver in SQL Server's success is that it is delivering higher level data services at the conceptual schema level. Currently, each of these services has a separate tools to describe conceptual entities and map them down to the underlying logical schema level. If you have a Customer in your problem domain, you need to define it one way for merge replication, another way for report builder, and so on.

Figure 4 shows SQL Server as it existed in Version 6.0—essentially a traditional RDBMS with all data centric services provided at the logical schema level. Figure 5 demonstrates the evolution of SQL Server into a data platform with many high value data services and multiple means to reify and map conceptual entities to their underlying logical schemata.

Figure 4. SQL Server 1995

© iTech Connect K. Feroz 8

Page 9: 26.3 Lambda Expressions

Figure 5. SQL Server 2005

Other Data Services While we have motivated the need for making the conceptual level real from the perspective of the SQL data platform it should be apparent that this mapping from real services at the conceptual level to any number of logical and programming/presentation representations is occurring in multiple areas. Consider the web services serialization "Data Contract" which maps between a CLR object state representation to its serialized XML form. If one does "contract first" services development, the nouns in the service contact are really state representations of entities—think back to the example of the "StockNotification". This trend will continue with:

Database "search". Many are working on performing "search" over structured storage—essentially trying to bridge the gap between query languages and search expressions. There are two vexing questions when trying to implement database search over a logical relational schema:

What does one return for a result? Returning disassociated addresses when matching on 98052 as a ZIP code is not very useful. How then do we return something of value when a search expression of "customer zip 98052" is presented? How about "Pizza zip 98052"? Note that making the conceptual level real can both help understand the semantics of the search expression and help shape what is returned as a result.

How does one implement a security scheme over database search? We must assume that the search expression domain is over the entire database. How then do we implement security on what we uncover in the database? Authorizing based upon access to the underlying logical schema only goes so far. The real liberation comes if we are able to raise the authorization scheme up from the logical schema level to the conceptual level—only there can we secure what matters: Customers, Orders and ProductMasters.

Office documents as a content model: Office 12 takes a major step in moving from a presentation format model to a content model. Thus, an Office document can be seen as an entity container bringing together the worlds of structured, semi-structured, and unstructured data. If we had a real conceptual model for an insurance domain, we could create an InfoPath form for claims processing by dragging entities from a designer into an InfoPath design surface. An adjuster can go to a claim scene and populate a form that both creates and references entities such as: PolicyHolder, Claim, Policy, Claimant, etc. They could also capture unstructured data such as a sketch of the scene, photographs of the damage, etc. This rich document can be "Submitted" into an application platform that has a rich conceptual notion of PolicyHolder, Claim, Date, Location, etc. Services can be built directly on top of these concepts—it is easy for someone to write a report listing Incidents occurring at a particular Location over the last year. The claim information can be easily extracted from the document and used as the noun in a web service that initiates a new claim process. Note that both data-centric document consumption into the platform and data-centric document production from the platform are enabled in this new world.

The VisionSo, what then is the vision for the next-generation data access? Namely, to acknowledge that significant technology and application trends are moving us towards providing richer services at the conceptual rather than at the logical schema level and to seize this as an opportunity to provide a broad platform around making the conceptual schema level real in a way that allows us to realize extensive platform value on top of it.

What services should be elevated? Briefly:

1. Data Modeling—we need to provide a data model that can define the structure essence of entities in a way that allows them to be represented in a variety of logical representations. This model must have a concrete representation that allows us to manipulate data at this level, and to build real design-time tools and runtime support around it. The platform needs to have a set-based query and update language for entities and relationships. Note that the entity data model is a conceptual model exposing values not objects. This is extremely important because the data platform needs to manage data at the highest possible semantic level without tying itself to the particular mechanisms of a presentation layer. This enables other data services such as reporting, replication, and analysis to leverage the mapping and query infrastructure that supports the conceptual level.

2. Mapping—having a concrete entity model allows us to build out a series of mappings to a variety of logical and presentation representations. Certainly there will be default mappings but part of our platform value will be in providing tools to map to and from foreign representations whether it is to federate product catalogs, normalize purchase order forms into our internal canonical standard, or to provide heterogeneous synchronization to a variety of operational systems as a Master Data Management solution.

3. Design-time tools—Entity based tools today produce models that are mainly destined for the plotter so that large pictures of a modeling effort can be enshrined on wall somewhere to quickly fall out of date. Some people use them to produce a logical or physical design for a relational database implementation but that's about as far as it goes. Our data access tooling should be layered and factored to meet a number of needs over a base entity model:

a. Base entity tooling—this is used to design entities as well as their relationships. b. Mapping—tooling to map from a conceptual entity into a number of logical and (and perhaps physical)

representations. c. Semantic tooling—can be built for higher level semantic services such as SQL Server Report builder where you

may introduce synonyms, aliases, translation and other semantic adornments for natural language and end user query.

© iTech Connect K. Feroz 9

Page 10: 26.3 Lambda Expressions

4. Transformation runtime—once we can describe and encode mappings, we can build machinery to automatically map entities into multiple representations. We can retrieve a "customer" from the database; realize it as an object; render it as XML for exposure through a web service all in a declarative fashion driven by the mapping descriptions.

5. Comprehensive programming model—we need programming models that bridge the gap between different logical representations (XML, relational, objects). In fact, by developing programming languages and APIs at the conceptual level, we will be able to liberate the programmer from the impedance mismatches that exist among different logical models. Further, these programming models should support development of application business logic that can run inside or outside the data store depending on deployment, performance, and scalability requirements of the application.

6. Data services targeting the conceptual level. Some examples include: a. Synchronization—many entity synchronization services can be made platform level services through this

vision. b. Security—we will want to build out security services at the entity level. c. Report builder—this service is already taking steps in this direction through the semantic data model (SMDL). d. Administration—beyond security, operations like archive can benefit from a conceptual perspective.

The force motivating the move towards conceptual level services should be clear at this point. We can see evidence all around from the stovepipes we are building to model, tool, and transform from conceptual entities to logical and physical representations. The Data Access vision that we are proposing seeks to establish a significant platform shift by unifying these stovepipes in a coordinated, cross-product, multi-release quest.

Making the Conceptual Level RealThis section outlines how one may define a conceptual model and work against it. We use a modified version of the Northwind database for familiarity.

Build the Conceptual ModelThe first step is to define one's conceptual model. The EDM represents a formal, design and run-time expression of such a model. The EDM allows you to describe the model in terms of entities and relationships. Ideally, there shall be two ways to define a model from scratch. One may define the model explicitly by hand writing the XML serialized form of the model as shown below. The other through a graphical EDM designer tool.

<?xml version="1.0"?><Schema Namespace="CNorthwind" xmlns="urn:schemas-microsoft-com:windows:storage">

<!-Typical Entity definition, has identitiy [the key] and a some members--> <EntityType Name="Product" Key="ProductID"> <Property Name="ProductID" Type="System.Int32" Nullable="false" /> <Property Name="ProductName" Type="System.String" Nullable="false" Size="max" /> <Property Name="QuantityPerUnit" Type="System.String" Nullable="false" Size="max" /> <Property Name="ReorderLevel" Type="System.Int16" Nullable="false" /> <Property Name="UnitPrice" Type="System.Decimal" Nullable="false" /> <Property Name="UnitsInStock" Type="System.Int16" Nullable="false" /> <Property Name="UnitsOnOrder" Type="System.Int16" Nullable="false" /> </EntityType>

<!-A derived product, we can map TPH, TPC, TPT--> <EntityType Name="DiscontinuedProduct" BaseType="Product"> <Property Name="DiscReason" Type="System.String" Nullable="false" Size="max" /> </EntityType>

<!-A complex type defines structure but no identity. I can   be used inline in 0 or more Entity definitions--> <ComplexType Name="CtAddress" > <Property Name="Address" Type="System.String" Nullable="false" Size="max" /> <Property Name="City" Type="System.String" Nullable="false" Size="max" /> <Property Name="PostalCode" Type="System.String" Nullable="false" Size="max" />

© iTech Connect K. Feroz 10

Page 11: 26.3 Lambda Expressions

<Property Name="Region" Type="System.String" Nullable="false" Size="max" /> <Property Name="Fax" Type="System.String" Nullable="false" Size="max" /> <Property Name="Country" Type="System.String" Nullable="false" Size="max" /> <Property Name="Phone" Type="System.String" Nullable="false" Size="max" /> </ComplexType>

<EntityType Name="Customer" Key="CustomerID"><!- Address is a member which references a complex type inline --> <Property Name="Address" Type="CNorthwind.CtAddress" Nullable="false" /> <Property Name="CompanyName" Type="System.String" Nullable="false" Size="max" /> <Property Name="ContactName" Type="System.String" Nullable="false" Size="max" /> <Property Name="ContactTitle" Type="System.String" Nullable="false" Size="max" /> <Property Name="CustomerID" Type="System.String" Nullable="false" Size="max" /> </EntityType>

<!-An example of an association between Product [defined above] and OrderDetails [not shown for sake of brevity]--> <Association Name="Order_DetailsProducts"> <End Name="Product" Type="Product" Multiplicity="1" /> <End Name="Order_Details" Type="OrderDetail" Multiplicity="*" /> </Association>

<!-The Entity Container defines the logical encapsulation of EntitySets (sets of (possibly) polymorphic instances of a type) and AssociationSets (logical link tables for relating   two or more entity instances)--> <EntityContainerType Name="CNorthwind"> <Property Name="Products" Type="EntitySet(Product)" /> <Property Name="Customers" Type="EntitySet(Customer)" /> <Property Name="Order_Details" Type="EntitySet(OrderDetail)" /> <Property Name="Orders" Type="EntitySet(Order)" /> <Property Name="Order_DetailsOrders" Type="RelationshipSet(Order_DetailsOrders)"> <End Name="Order" Extent="Orders" /> <End Name="Order_Details" Extent="Order_Details" /> </Property> <Property Name="Order_DetailsProducts" Type="RelationshipSet(Order_DetailsProducts)"> <End Name="Product" Extent="Products" /> <End Name="Order_Details" Extent="Order_Details"/> </Property> </EntityContainerType> </Schema>

Apply the MappingOnce one has an EDM conceptual model, providing that one has a target store already defined, we can map to the target store's logical schema model.

© iTech Connect K. Feroz 11

Page 12: 26.3 Lambda Expressions

Figure 6. Entity Data Model for Northwind (Click on the image for a larger picture)

Of course this model can be expressed in SQL DDL, for instance the Employees Table may look like:

CREATE TABLE [dbo].[Employees]( [EmployeeID] [int] NOT NULL, [LastName] [nvarchar](20) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL, [FirstName] [nvarchar](10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL, [Title] [nvarchar](30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [Extension] [nvarchar](4) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [Notes] [nvarchar](max) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [ReportsTo] [int] NULL, CONSTRAINT [PK_Employees] PRIMARY KEY CLUSTERED ( [EmployeeID] ASC)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]) ON [PRIMARY]

As with the conceptual EDM, one can hand write an explicit mapping or use a mapping tool. Figure 7 is a representation of a mapping design tool in action mapping the above EDM conceptual model to a modified version of the Northwind logical schema..

The items on the left represent the constructs in the EDM model. The items on the right represent the constructs in the store's logical model. In this example we can see one of the motivations for modifying Northwind. The desire was to reflect a common strategy for vertically partitioning data in separate tables in the store. Although one does this, ideally one would want to reason about the data as a single entity without the need for joins or knowledge of the logical model. An entity like Employee can be defined and mapped across multiple tables in the store. Following is the serialized form of the mapping, note that we map at the EntitySet level and allow table fragments which express the tables and join conditions that make up an entity with in a given EntitySet:

<EntitySetMapping cdm:Name='Employees'> <EntityTypeMapping cdm:TypeName='IsTypeOf(CNorthwind.Employee)'> <TableMappingFragment cdm:TableName='ContactInfo'> <EntityKey> <ScalarProperty cdm:Name='EmployeeID' cdm:ColumnName='EmployeeID' /> </EntityKey> <ScalarProperty cdm:Name='Address' cdm:ColumnName='Address' /> <ScalarProperty cdm:Name='City' cdm:ColumnName='City' /> <ScalarProperty cdm:Name='Country' cdm:ColumnName='Country' /> <ScalarProperty cdm:Name='PostalCode' cdm:ColumnName='PostalCode' /> <ScalarProperty cdm:Name='Region' cdm:ColumnName='Region' />

© iTech Connect K. Feroz 12

Page 13: 26.3 Lambda Expressions

</TableMappingFragment> </EntityTypeMapping> <EntityTypeMapping cdm:TypeName='IsTypeOf(CNorthwind.Employee)'> <TableMappingFragment cdm:TableName='Employees'> <EntityKey> <ScalarProperty cdm:Name='EmployeeID' cdm:ColumnName='EmployeeID' /> </EntityKey> <ScalarProperty cdm:Name='Extension' cdm:ColumnName='Extension' /> <ScalarProperty cdm:Name='FirstName' cdm:ColumnName='FirstName' /> <ScalarProperty cdm:Name='LastName' cdm:ColumnName='LastName' /> <ScalarProperty cdm:Name='Notes' cdm:ColumnName='Notes' /> <ScalarProperty cdm:Name='ReportsTo' cdm:ColumnName='ReportsTo' /> <ScalarProperty cdm:Name='Title' cdm:ColumnName='Title' /> </TableMappingFragment> </EntityTypeMapping> <EntityTypeMapping cdm:TypeName='IsTypeOf(CNorthwind.Employee)'> <TableMappingFragment cdm:TableName='PersonalInfo'> <EntityKey> <ScalarProperty cdm:Name='EmployeeID' cdm:ColumnName='EmployeeID' /> </EntityKey> <ScalarProperty cdm:Name='BirthDate' cdm:ColumnName='BirthDate' /> <ScalarProperty cdm:Name='HireDate' cdm:ColumnName='HireDate' /> <ScalarProperty cdm:Name='HomePhone' cdm:ColumnName='HomePhone' /> <ScalarProperty cdm:Name='Photo' cdm:ColumnName='Photo' /> <ScalarProperty cdm:Name='TitleOfCourtesy' cdm:ColumnName='TitleOfCourtesy' /> </TableMappingFragment> </EntityTypeMapping> </EntitySetMapping>

Figure 7. A conceptual to logical mapping tool (Click on the image for a larger picture)

Automatically Generated ClassesHaving the conceptual level is indeed sufficient for many applications as it provides a domain model that is live within the context of a comfortable pattern (ADO.NET commands, connections and data readers) and allows for great late bound scenarios since return values are self describing. Many applications, however, prefer an object programming layer. This can be facilitated through code generation driven from the EDM description. Within Visual Studio, one can leverage an ADO.NET aware build task to generate the corresponding code. For increased flexibility and data independence between the object and conceptual level, there © iTech Connect K. Feroz 13

Page 14: 26.3 Lambda Expressions

may be another level of mapping between classes and the conceptual model. This would enable applications built against these classes to be reused against versions of the conceptual model provided a legal map can be defined. For the mapping between classes and the conceptual model the constraints are minimal and are along the lines of preserving identity and fidelity of round trips. The following diagram illustrates the classes from same solution.

Figure 8. Automatically generated classes representing a programming level (Click on the image for a larger picture)

The mapping between classes and the conceptual model is a straightforward member-wise mapping, a snippet of it is shown below:

<ObjectMapping cdm:CLRTypeName="CNorthwind.Product" cdm:CdmTypeName="CNorthwind.Product"> <MemberMapping cdm:CLRMember="ProductID" cdm:CLRAlias="ProductID" cdm:CdmMember="ProductID" /> <MemberMapping cdm:CLRMember="ProductName" cdm:CLRAlias="ProductName" cdm:CdmMember="ProductName" /> <MemberMapping cdm:CLRMember="QuantityPerUnit" cdm:CLRAlias="QuantityPerUnit" cdm:CdmMember="QuantityPerUnit" /> <MemberMapping cdm:CLRMember="ReorderLevel" cdm:CLRAlias="ReorderLevel" cdm:CdmMember="ReorderLevel" /> <MemberMapping cdm:CLRMember="UnitPrice" cdm:CLRAlias="UnitPrice" cdm:CdmMember="UnitPrice" /> <MemberMapping cdm:CLRMember="UnitsInStock" cdm:CLRAlias="UnitsInStock" cdm:CdmMember="UnitsInStock" /> <MemberMapping cdm:CLRMember="UnitsOnOrder" cdm:CLRAlias="UnitsOnOrder" cdm:CdmMember="UnitsOnOrder" /></ObjectMapping><ObjectMapping cdm:CLRTypeName="CNorthwind.Shipper" cdm:CdmTypeName="CNorthwind.Shipper"> <MemberMapping cdm:CLRMember="CompanyName" cdm:CLRAlias="CompanyName" cdm:CdmMember="CompanyName" /> <MemberMapping cdm:CLRMember="Phone" cdm:CLRAlias="Phone" cdm:CdmMember="Phone" /> <MemberMapping cdm:CLRMember="ShipperID" cdm:CLRAlias="ShipperID" cdm:CdmMember="ShipperID" /></ObjectMapping>

Using Objects One can interact with objects and perform regular Create Read Update Delete (CRUD) operations on the objects, for example here we query for all Employees hired after a given point in time:

public void DoObjectQueries(DateTime date) { //--- get a connection MapConnection conn =

© iTech Connect K. Feroz 14

Page 15: 26.3 Lambda Expressions

new MapConnectionFactory().GetMapConnection(); Northwind nw = new Northwind(conn,conn.MetadataWorkspace); var employees = from e in nw.Employees where e.HireDate > date select e; foreach( Employee e in employees ) { Console.WriteLine(e.FirstName); }

}

Similarly we could give each person a promotion and then persist the change to the store:

MapConnection conn = new MapConnectionFactory().GetMapConnection(); Northwind nw = new Northwind(conn,conn.MetadataWorkspace); var employees = from e in nw.Employees where e.HireDate > date select e; foreach( Employee e in employees ) { Console.WriteLine(e.FirstName); e.Title = "manager"; } nw.SaveChanges();

In this case the interesting part of this interaction is the SaveChanges() call on the northwind instance. The northwind instance is a specialization of ObjectContext and provides a top-level context for state management and the like. In this particular case instances that are retrieved from the store are cached by default and when we invoke save changes the changes for the type (which is cached here) are pushed to the store through the update pipeline.

Using ValuesThere are many ISVs, framework developers who just prefer to work against a .NET data provider; the MapProvider is intended for such usage scenarios. The Map Provider has a connection and a command and returns a DbDataReader when one invokes MapCommand.ExecuteReader(). An example of a query using the MapCommand is as follows:

public void DoObjectQueries(DateTime date) { //--- get a connection using (MapConnection conn = new MapConnectionFactory().GetMapConnection()) { conn.Open(); MapCommand command = conn.CreateCommand(); command.CommandText = @" Select value e from Employees as e where e.HireDate>@HireDate"; command.Parameters.Add(new MapParameter("HireDate",date)); DbDataReader reader = command.ExecuteReader(); while(reader.Read()) { //--- do something interesting here } } }

Entity Framework ArchitectureThis section briefly describes the architecture of the Entity Framework being built as part of ADO.NET. A detailed description of the architecture can be found in [ARCH]. The main functional components of the ADO.NET Entity Framework (see Figure 1) are:

Data source-specific providers. The Entity Framework builds on the ADO.NET data provider model. SqlClient is the storage-specific provider for the Microsoft SQL Server database products including SQL Server 2000 and SQL Server 2005. WCFClient is a

© iTech Connect K. Feroz 15

Page 16: 26.3 Lambda Expressions

future provider to enable access to data from Web Services. In terms of interfaces, the ADO.NET provider model contains Connection, Command, and DataReader objects. A new SqlGen service in the Bridge (mentioned below) generates store-specific SQL text from canonical commands.

Map provider. The Entity Framework includes a new data provider, the Map provider. This provider houses the services implementing the mapping transformation from conceptual to logical constructs. The Map provider represents a value-based, client-side view runtime where data is accessed in terms of EDM entities and relationships and queried using the eSQL language. The Map provider includes the following services:

EDM/eSQL. The Map provider processes and exposes data in terms of the EDM values. Queries and updates are formulated using an entity-based SQL language called eSQL.

Mapping. View mapping, one of the key services of the Map provider, is the subsystem that implements bidirectional views that allow applications to manipulate data in terms of entities and relationships rather than rows and tables. The mapping from tables to entities is specified declaratively through a mapping definition language.

Query and update pipelines. Queries and update requests are specified to the Map provider via its Command object either as eSQL text or as canonical command trees.

Store-specific bridge. The bridge component is a service that supports the query execution capabilities of the query pipeline. The bridge takes a command tree as input and produces an equivalent command tree (or command trees) in terms of query capabilities supported by the underlying store as output.

Metadata services. The metadata service supports all metadata discovery activities of the components running inside the Map provider. All metadata associated with EDM concepts (entities, relationships, entitysets, relationshipsets), store concepts (tables, columns, constraints), and mapping concepts are exposed via metadata interfaces. The metadata services component also serves as a link between the domain modeling tools which support model-driven application design.

Transactions. The Map provider integrates with the transactional capabilities of the underlying stores.

API. The API of the Map provider follows the ADO.NET provider model based on Connection, Command, and DataReader objects. Like other store-specific providers, the Map provider accepts commands in the form of eSQL text or canonical trees. The results of commands are returned as DataReader objects

Occasionally Connected Components. The Entity Framework enhances the well established disconnected programming model introduced by the ADO.NET DataSet. In addition to enhancing the programming experiences around the typed and un-typed DataSets, the Entity Framework embraces the EDM to provide rich disconnected experiences around cached collections of entities and entitysets.

Embedded Database. The Data Platform will include the capabilities of a low-memory footprint, embeddable database engine to enrich the services for applications that need rich middle-tier caching and disconnected programming experiences. The embedded database will include a simple query processor and non-authoritative persistence capabilities to enable large middle-tier data caches.

Design and Metadata Tools. The data platform integrates with domain designers to enable model-driven application development. The tools include EDM, mapping, and query modelers. Note that mapping tools driven from the EDM has created a series of tool generated maps to map from a user defined entity such as an Asset or Customer into a series of logical schema and presentation formats associated with various platform services. These could represent a Sharepoint List definition, a binding for a business object, a merge replication logical record, or an XML representation for use in a web service or workflow. In any case, the entity need only be defined a single time and tooling associated with the data platform can generate the appropriate logical schema definitions.

Programming Layers. ADO.NET allows multiple programming layers to be plugged onto the value-based entity data services layer exposed by the Map provider. The object services component is one such programming layer that surfaces CLR objects. There are multiple mechanisms by which a programming layer may interact with the entity framework. One of the important mechanisms is LINQ expression trees. In the future, we expect other programming surfaces to be built on top of the entity services.

Services. Rich SQL data services such as reporting, replication, business analysis will be built on top of the entity framework.

© iTech Connect K. Feroz 16

Page 17: 26.3 Lambda Expressions

Figure 9. Entity Framework Architecture (Click on the image for a larger picture)

References[CHEN76] Chen, Peter Pin-Shan. The Entity-Relationship Model—toward a unified view of data, ACM Transactions on Database Systems, Vol. 1, Issue 1, March 1976, pp. 9-36.

[UML] Unified Modeling Language. http://www.uml.org/ [ http://www.uml.org/default.aspx ] .

[EDM] Entity Data Model. ADO.NET Technical Preview [ http://msdn.microsoft.com/en-us/library/aa697428(VS.80).aspx ] , June 2006

[ARCH] ADO.NET Entity Framework Architecture. ADO.NET Technical Preview [ http://msdn.microsoft.com/en-us/library/aa697427(VS.80).aspx ] , June 2006.

[ADO.NET] ADO.NET Tech Preview Overview, June 2006.

© Microsoft Corporation. All rights reserved. [ http://msdn.microsoft.com/en-us/library/ms369863(VS.80).aspx ]

© iTech Connect K. Feroz 17

Page 18: 26.3 Lambda Expressions

Overview of C# 3.0 

March 2007

Anders Hejlsberg, Mads Torgersen

Applies to:   Visual C# 3.0

Summary: Technical overview of C# 3.0 ("C# Orcas"), which introduces several language extensions that build on C# 2.0 to support the creation and use of higher order, functional style class libraries. (38 printed pages)

Contents

Introduction26.1 Implicitly Typed Local Variables26.2 Extension Methods   26.2.1 Declaring Extension Methods   26.2.2 Available Extension Methods   26.2.3 Extension Method Invocations26.3 Lambda Expressions   26.3.1 Anonymous Method and Lambda Expression Conversions   26.3.2 Delegate Creation Expressions   26.3.3 Type Inference      26.3.3.1 The first phase      26.3.3.2 The second phase      26.3.3.3 Input types      26.3.3.4 Output types      26.3.3.5 Dependence      26.3.3.6 Output type inferences      26.3.3.7 Explicit argument type inferences      26.3.3.8 Exact inferences      26.3.3.9 Lower-bound inferences      26.3.3.10 Fixing      26.3.3.11 Inferred return type      26.3.3.12 Type inference for conversion of method groups      26.3.3.13 Finding the best common type of a set of expressions   26.3.4 Overload Resolution26.4 Object and Collection Initializers   26.4.1 Object Initializers   26.4.2 Collection Initializers26.5 Anonymous Types26.6 Implicitly Typed Arrays26.7 Query Expressions   26.7.1 Query Expression Translation      26.7.1.1 Select and groupby clauses with continuations      26.7.1.2 Explicit range variable types      26.7.1.3 Degenerate query expressions      26.7.1.4 From, let, where, join and orderby clauses      26.7.1.5 Select clauses      26.7.1.6 Groupby clauses      26.7.1.7 Transparent identifiers   26.7.2 The Query Expression Pattern26.8 Expression Trees   26.8.1 Overload Resolution26.9 Automatically Implemented Properties

IntroductionThis article contains some updates that apply to Visual C# 3.0. A comprehensive specification will accompany the release of the language.

C# 3.0 ("C# Orcas") introduces several language extensions that build on C# 2.0 to support the creation and use of higher order, functional style class libraries. The extensions enable construction of compositional APIs that have equal expressive power of query languages in domains such as relational databases and XML. The extensions include:

Implicitly typed local variables, which permit the type of local variables to be inferred from the expressions used to initialize them.

Extension methods, which make it possible to extend existing types and constructed types with additional methods. Lambda expressions, an evolution of anonymous methods that provides improved type inference and conversions to both

delegate types and expression trees. Object initializers, which ease construction and initialization of objects.

© iTech Connect K. Feroz 18

Page 19: 26.3 Lambda Expressions

Anonymous types, which are tuple types automatically inferred and created from object initializers. Implicitly typed arrays, a form of array creation and initialization that infers the element type of the array from an array

initializer. Query expressions, which provide a language integrated syntax for queries that is similar to relational and hierarchical

query languages such as SQL and XQuery. Expression trees, which permit lambda expressions to be represented as data (expression trees) instead of as code

(delegates).

This document is a technical overview of those features. The document makes reference to the C# Language Specification Version 1.2 (§1 through §18) and the C# Language Specification Version 2.0 (§19 through §25), both of which are available on the C# Language Home Page (http://msdn.microsoft.com/vcsharp/aa336809.aspx [ http://msdn.microsoft.com/vcsharp/aa336809.aspx ] ).

26.1 Implicitly Typed Local VariablesIn an implicitly typed local variable declaration, the type of the local variable being declared is inferred from the expression used to initialize the variable. When a local variable declaration specifies var as the type and no type named var is in scope, the declaration is an implicitly typed local variable declaration. For example:

Copy Codevar i = 5;var s = "Hello";var d = 1.0;var numbers = new int[] {1, 2, 3};var orders = new Dictionary<int,Order>();

The implicitly typed local variable declarations above are precisely equivalent to the following explicitly typed declarations:

Copy Codeint i = 5;string s = "Hello";double d = 1.0;int[] numbers = new int[] {1, 2, 3};Dictionary<int,Order> orders = new Dictionary<int,Order>();

A local variable declarator in an implicitly typed local variable declaration is subject to the following restrictions:

The declarator must include an initializer. The initializer must be an expression. The initializer expression must have a compile-time type which cannot be the null type. The local variable declaration cannot include multiple declarators. The initializer cannot refer to the declared variable itself

The following are examples of incorrect implicitly typed local variable declarations:

Copy Codevar x; // Error, no initializer to infer type fromvar y = {1, 2, 3}; // Error, collection initializer not permittedvar z = null; // Error, null type not permittedvar u = x => x + 1; // Error, lambda expressions do not have a typevar v = v++; // Error, initializer cannot refer to variable itself

© iTech Connect K. Feroz 19

Page 20: 26.3 Lambda Expressions

For reasons of backward compatibility, when a local variable declaration specifies var as the type and a type named var is in scope, the declaration refers to that type. Since a type named var violates the established convention of starting type names with an upper case letter, this situation is unlikely to occur.

The for-initializer of a for statement (§8.8.3) and the resource-acquisition of a using statement (§8.13) can be an implicitly typed local variable declaration. Likewise, the iteration variable of a foreach statement (§8.8.4) may be declared as an implicitly typed local variable, in which case the type of the iteration variable is inferred to be the element type of the collection being enumerated. In the example

Copy Codeint[] numbers = { 1, 3, 5, 7, 9 };foreach (var n in numbers) Console.WriteLine(n);

the type of n is inferred to be int, the element type of numbers.

Only local-variable-declaration, for-initializer, resource-acquisition and foreach-statement can contain implicitly typed local variable declarations.

26.2 Extension MethodsExtension methods are static methods that can be invoked using instance method syntax. In effect, extension methods make it possible to extend existing types and constructed types with additional methods.

Note   Extension methods are less discoverable and more limited in functionality than instance methods. For those reasons, it is recommended that extension methods be used sparingly and only in situations where instance methods are not feasible or possible. Extension members of other kinds, such as properties, events, and operators, are being considered but are currently not supported.

26.2.1 Declaring Extension MethodsExtension methods are declared by specifying the keyword this as a modifier on the first parameter of the methods. Extension methods can only be declared in non-generic, non-nested static classes. The following is an example of a static class that declares two extension methods:

Copy Codenamespace Acme.Utilities{ public static class Extensions { public static int ToInt32(this string s) { return Int32.Parse(s); } public static T[] Slice<T>(this T[] source, int index, int count) { if (index < 0 || count < 0 || source.Length – index < count) throw new ArgumentException(); T[] result = new T[count]; Array.Copy(source, index, result, 0, count); return result; } }}

The first parameter of an extension method can have no modifiers other than this, and the parameter type cannot be a pointer type.

© iTech Connect K. Feroz 20

Page 21: 26.3 Lambda Expressions

Extension methods have all the capabilities of regular static methods. In addition, once imported, extension methods can be invoked using instance method syntax.

26.2.2 Available Extension MethodsExtension methods are available in a namespace if declared in a static class or imported through using-namespace-directives (§9.3.2) in that namespace. In addition to importing the types contained in an imported namespace, a using-namespace-directive thus imports all extension methods in all static classes in the imported namespace.

In effect, available extension methods appear as additional methods on the types that are given by their first parameter and have lower precedence than regular instance methods. For example, when the Acme.Utilities namespace from the example above is imported with the using-namespace-directive

Copy Codeusing Acme.Utilities;

it becomes possible to invoke the extension methods in the static class Extensions using instance method syntax:

Copy Codestring s = "1234";int i = s.ToInt32(); // Same as Extensions.ToInt32(s)int[] digits = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};int[] a = digits.Slice(4, 3); // Same as Extensions.Slice(digits, 4, 3)

26.2.3 Extension Method Invocations

The detailed rules for extension method invocation are described in the following. In a method invocation (§7.5.5.1) of one of the forms

Copy Codeexpr . identifier ( )expr . identifier ( args )expr . identifier < typeargs > ( )expr . identifier < typeargs > ( args )

if the normal processing of the invocation finds no applicable instance methods (specifically, if the set of candidate methods for the invocation is empty), an attempt is made to process the construct as an extension method invocation. The method invocation is first rewritten to one of the following, respectively:

Copy Codeidentifier ( expr )identifier ( expr , args )identifier < typeargs > ( expr )identifier < typeargs > ( expr , args )

The rewritten form is then processed as a static method invocation, except for the way in which identifier is resolved: Starting with the closest enclosing namespace declaration, continuing with each enclosing namespace declaration, and ending with the containing compilation unit, successive attempts are made to process the rewritten method invocation with a method group consisting of all available and accessible extension methods in the namespace with the name given by identifier. From this set remove all the methods that are not applicable (§7.4.2.1) and the ones where no implicit identity, reference or boxing conversion exists from the first argument to the first parameter. The first method group that yields a non-empty such set of candidate methods is the one chosen for the rewritten method invocation, and normal overload resolution (§7.4.2) is applied to select the best extension method from the set of candidates. If all attempts yield empty sets of candidate methods, a compile-time error occurs.

The preceding rules mean that instance methods take precedence over extension methods, and extension methods available in inner namespace declarations take precedence over extension methods available in outer namespace declarations. For example:

© iTech Connect K. Feroz 21

Page 22: 26.3 Lambda Expressions

Copy Codepublic static class E{ public static void F(this object obj, int i) { } public static void F(this object obj, string s) { }}class A { }class B{ public void F(int i) { }}class C{ public void F(object obj) { }}class X{ static void Test(A a, B b, C c) { a.F(1); // E.F(object, int) a.F("hello"); // E.F(object, string) b.F(1); // B.F(int) b.F("hello"); // E.F(object, string) c.F(1); // C.F(object) c.F("hello"); // C.F(object) }}

In the example, the B method takes precedence over the first extension method, and the C method takes precedence over both extension methods.

26.3 Lambda ExpressionsC# 2.0 introduces anonymous methods, which allow code blocks to be written "in-line" where delegate values are expected. While anonymous methods provide much of the expressive power of functional programming languages, the anonymous method syntax is rather verbose and imperative in nature. Lambda expressions provide a more concise, functional syntax for writing anonymous methods.

A lambda expression is written as a parameter list, followed by the => token, followed by an expression or a statement block.

Copy Codeexpression:assignmentnon-assignment-expressionnon-assignment-expression:conditional-expressionlambda-expression

© iTech Connect K. Feroz 22

Page 23: 26.3 Lambda Expressions

query-expressionlambda-expression:( lambda-parameter-listopt ) => lambda-expression-bodyimplicitly-typed-lambda-parameter => lambda-expression-bodylambda-parameter-list:explicitly-typed-lambda-parameter-listimplicitly-typed-lambda-parameter-listexplicitly-typed-lambda-parameter-listexplicitly-typed-lambda-parameterexplicitly-typed-lambda-parameter-list , explicitly-typed-lambda-parameterexplicitly-typed-lambda-parameter:parameter-modifieropt type identifierimplicitly-typed-lambda-parameter-listimplicitly-typed-lambda-parameterimplicitly-typed-lambda-parameter-list , implicitly-typed-lambda-parameterimplicitly-typed-lambda-parameter:identifierlambda-expression-body:expressionblock

The => operator has the same precedence as assignment (=) and is right-associative.

The parameters of a lambda expression can be explicitly or implicitly typed. In an explicitly typed parameter list, the type of each parameter is explicitly stated. In an implicitly typed parameter list, the types of the parameters are inferred from the context in which the lambda expression occurs—specifically, when the lambda expression is converted to a compatible delegate type, that delegate type provides the parameter types (§26.3.1).

In a lambda expression with a single, implicitly typed parameter, the parentheses may be omitted from the parameter list. In other words, a lambda expression of the form

Copy Code( param ) => expr

can be abbreviated to

Copy Codeparam => expr

Some examples of lambda expressions follow below:

Copy Codex => x + 1 // Implicitly typed, expression bodyx => { return x + 1; } // Implicitly typed, statement body(int x) => x + 1 // Explicitly typed, expression body(int x) => { return x + 1; } // Explicitly typed, statement body(x, y) => x * y // Multiple parameters() => Console.WriteLine() // No parameters

© iTech Connect K. Feroz 23

Page 24: 26.3 Lambda Expressions

In general, the specification of anonymous methods, provided in §21 of the C# 2.0 Specification, also applies to lambda expressions. Lambda expressions are functionally similar to anonymous methods, except for the following points:

Anonymous methods permit the parameter list to be omitted entirely, yielding convertibility to delegate types of any list of parameters.

Lambda expressions permit parameter types to be omitted and inferred whereas anonymous methods require parameter types to be explicitly stated.

The body of a lambda expression can be an expression or a statement block whereas the body of an anonymous method can only be a statement block.

Lambda expressions with an expression body can be converted to expression trees (§26.8).

26.3.1 Anonymous Method and Lambda Expression ConversionsNote   This section replaces §21.3.

An anonymous-method-expression and a lambda-expression is classified as a value with special conversion rules. The value does not have a type but can be implicitly converted to a compatible delegate type. Specifically, a delegate type D is compatible with an anonymous method or lambda-expression L provided:

D and L have the same number of parameters. If L is an anonymous method that does not contain an anonymous-method-signature, then D may have zero or more

parameters of any type, as long as no parameter of D has the out parameter modifier. If L has an explicitly typed parameter list, each parameter in D has the same type and modifiers as the corresponding

parameter in L. If L is a lambda expression that has an implicitly typed parameter list, D has no ref or out parameters. If D has a void return type and the body of L is an expression, when each parameter of L is given the type of the

corresponding parameter in D, the body of L is a valid expression (wrt §7) that would be permitted as a statement-expression (§8.6).

If D has a void return type and the body of L is a statement block, when each parameter of L is given the type of the corresponding parameter in D, the body of L is a valid statement block (wrt §8.2) in which no return statement specifies an expression.

If D has a non-void return type and the body of L is an expression, when each parameter of L is given the type of the corresponding parameter in D, the body of L is a valid expression (wrt §7) that is implicitly convertible to the return type of D.

If D has a non-void return type and the body of L is a statement block, when each parameter of L is given the type of the corresponding parameter in D, the body of L is a valid statement block (wrt §8.2) with a non-reachable end point in which each return statement specifies an expression that is implicitly convertible to the return type of D.

The examples that follow use a generic delegate type Func<A,R> which represents a function taking an argument of type A and returning a value of type R:

Copy Codedelegate R Func<A,R>(A arg);

In the assignments

Copy CodeFunc<int,int> f1 = x => x + 1; // OkFunc<int,double> f2 = x => x + 1; // OkFunc<double,int> f3 = x => x + 1; // Error

the parameter and return types of each lambda expression are determined from the type of the variable to which the lambda expression is assigned. The first assignment successfully converts the lambda expression to the delegate type Func<int,int> because, when x is given type int, x + 1 is a valid expression that is implicitly convertible to type int. Likewise, the second assignment successfully converts the lambda expression to the delegate type Func<int,double> because the result of x + 1 (of type int) is implicitly convertible to type double. However, the third assignment is a compile-time error because, when x is given type double, the result of x + 1 (of type double) is not implicitly convertible to type int.

26.3.2 Delegate Creation Expressions

© iTech Connect K. Feroz 24

Page 25: 26.3 Lambda Expressions

Note   This section replaces §21.10.

Delegate creation expressions (§7.5.10.3) are extended to permit the argument to be an expression classified as a method group, an expression classified as an anonymous method or lambda expression, or a value of a delegate type.

The compile-time processing of a delegate-creation-expression of the form new D(E), where D is a delegate-type and E is an expression, consists of the following steps: If E is a method group, a method group conversion (§21.9) must exist from E to D, and the delegate creation expression

is processed in the same way as that conversion. If E is an anonymous method or lambda expression, an anonymous method or lambda expression conversion (§ 26.3.1)

must exist from E to D, and the delegate creation expression is processed in the same way as that conversion. If E is a value of a delegate type, the method signature of E must be consistent (§21.9) with D, and the result is a

reference to a newly created delegate of type D that refers to the same invocation list as E. If E is not consistent with D, a compile-time error occurs.

26.3.3 Type InferenceNote   This section replaces §20.6.4.

When a generic method is called without specifying type arguments, a type inference process attempts to infer type arguments for the call. The presence of type inference allows a more convenient syntax to be used for calling a generic method, and allows the programmer to avoid specifying redundant type information. For example, given the method declaration:

Copy Codeclass Chooser{ static Random rand = new Random(); public static T Choose<T>(T first, T second) { return (rand.Next(2) == 0)? first: second; }}

it is possible to invoke the Choose method without explicitly specifying a type argument:

Copy Codeint i = Chooser.Choose(5, 213); // Calls Choose<int>string s = Chooser.Choose("foo", "bar"); // Calls Choose<string>

Through type inference, the type arguments int and string are determined from the arguments to the method.

Type inference occurs as part of the compile-time processing of a method invocation (§20.9.7) and takes place before the overload resolution step of the invocation. When a particular method group is specified in a method invocation, and no type arguments are specified as part of the method invocation, type inference is applied to each generic method in the method group. If type inference succeeds, then the inferred type arguments are used to determine the types of arguments for subsequent overload resolution. If overload resolution chooses a generic method as the one to invoke, then the inferred type arguments are used as the actual type arguments for the invocation. If type inference for a particular method fails, that method does not participate in overload resolution. The failure of type inference, in and of itself, does not cause a compile-time error. However, it often leads to a compile-time error when overload resolution then fails to find any applicable methods.

If the supplied number of arguments is different than the number of parameters in the method, then inference immediately fails. Otherwise, assume that the generic method has the following signature:

Copy CodeTr M<X1...Xn>(T1 x1 ... Tm xm)

With a method call of the form M(e1...em) the task of type inference is to find unique type arguments S1...Sn for each of the type parameters X1...Xn so that the call M<S1...Sn>(e1...em) becomes valid.

During the process of inference each type parameter Xi is either fixed to a particular type Si or unfixed with an associated set of bounds. Each of the bounds is some type T. Initially each type variable Xi is unfixed with an empty set of bounds.© iTech Connect K. Feroz 25

Page 26: 26.3 Lambda Expressions

Type inference takes place in phases. Each phase will try to infer type arguments for more type variables based on the findings of the previous phase. The first phase makes some initial inferences of bounds, whereas the second phase fixes type variables to specific types and infers further bounds. The second phase may have to be repeated a number of times.

Note   When we refer to delegate types throughout the following, this should be taken to include also types of the form Expression<D> where D is a delegate type. The argument and return types of Expression<D> are those of D.Note   Type inference takes place not only when a generic method is called. Type inference for conversion of method groups is described in §26.3.3.12 and finding the best common type of a set of expressions is described in §26.3.3.13.

26.3.3.1 The first phase

For each of the method arguments ei:

An explicit argument type inference (§26.3.3.7) is made from ei with type Ti if ei is a lambda expression, an anonymous method, or a method group.

An output type inference (§26.3.3.6) is made from ei with type Ti if ei is not a lambda expression, an anonymous method, or a method group.

26.3.3.2 The second phase

All unfixed type variables Xi that depend on (§26.3.3.5) no Xj are fixed (§26.3.3.10).

If no such type variables exist, all unfixed type variables Xi are fixed for which all of the following hold:

There is at least one type variable Xj that depends on Xi. Xi has a non-empty set of bounds.

If no such type variables exist and there are still unfixed type variables, type inference fails. If no further unfixed type variables exist, type inference succeeds. Otherwise, for all arguments ei with corresponding argument type Ti where the output types (§26.3.3.4) contain unfixed type variables Xj but the input types (§26.3.3.3) do not, an output type inference (§26.3.3.6) is made for ei with type Ti. Then the second phase is repeated.

26.3.3.3 Input types

If e is a method group or implicitly typed lambda expression and T is a delegate type then all the argument types of T are input types of e with type T.

26.3.3.4 Output types

If e is a method group, an anonymous method, a statement lambda or an expression lambda and T is a delegate type then the return type of T is an output type of e with type T.

26.3.3.5 Dependence

An unfixed type variable Xi depends directly on an unfixed type variable Xj if for some argument ek with type Tk Xj occurs in an input type of ek with type Tk and Xi occurs in an output type of ek with type Tk.

Xj depends on Xi if Xj depends directly on Xi or if Xi depends directly on Xk and Xk depends on Xj. Thus "depends on" is the transitive but not reflexive closure of "depends directly on".

26.3.3.6 Output type inferences

An output type inference is made from an expression e with type T in the following way:

If e is a lambda or anonymous method with inferred return type U (§26.3.3.11) and T is a delegate type with return type Tb, then a lower-bound inference (§26.3.3.9) is made from U for Tb.

Otherwise, if e is a method group and T is a delegate type with parameter types T1...Tk and overload resolution of e with the types T1...Tk yields a single method with return type U, then a lower-bound inference is made from U for Tb.

Otherwise, if e is an expression with type U, then a lower-bound inference is made from U for T. Otherwise, no inferences are made.

26.3.3.7 Explicit argument type inferences

An explicit argument type inference is made from an expression e with type T in the following way:

© iTech Connect K. Feroz 26

Page 27: 26.3 Lambda Expressions

If e is an explicitly typed lambda expression or anonymous method with argument types U1...Uk and T is a delegate type with parameter types V1...Vk then for each Ui an exact inference (§26.3.3.8) is made from Ui for the corresponding Vi.

26.3.3.8 Exact inferences

An exact inference from a type U for a type V is made as follows:

If V is one of the unfixed Xi then U is added to the set of bounds for Xi. Otherwise, if U is an array type Ue[...] and V is an array type Ve[...] of the same rank then an exact inference from Ue

to Ve is made. Otherwise, if V is a constructed type C<V1...Vk> and U is a constructed type C<U1...Uk> then an exact inference is

made from each Ui to the corresponding Vi. Otherwise, no inferences are made.

26.3.3.9 Lower-bound inferences

A lower-bound inference from a type U for a type V is made as follows:

If V is one of the unfixed Xi then U is added to the set of bounds for Xi. Otherwise if U is an array type Ue[...] and V is either an array type Ve[...] of the same rank, or if U is a one-dimensional

array type Ue[]and V is one of IEnumerable<Ve>, ICollection<Ve> or IList<Ve> then: If Ue is known to be a reference type then a lower-bound inference from Ue to Ve is made. Otherwise, an exact inference from Ue to Ve is made.

Otherwise if V is a constructed type C<V1...Vk> and there is a unique set of types U1...Uk such that a standard implicit conversion exists from U to C<U1...Uk> then an exact inference is made from each Ui for the corresponding Vi.

Otherwise, no inferences are made.

26.3.3.10 Fixing

An unfixed type variable Xi with a set of bounds is fixed as follows.

The set of candidate types Uj starts out as the set of all types in the set of bounds for Xi. We then examine each bound for Xi in turn. For each bound U of X all types Uj to which there is not a standard implicit

conversion from U are removed from the candidate set. If among the remaining candidate types Uj there is a unique type V from which there is a standard implicit conversion to

all the other candidate types, then Xi is fixed to V. Otherwise, type inference fails.

26.3.3.11 Inferred return type

For purposes of type inference and overload resolution, the inferred return type of a lambda expression or anonymous method e is determined as follows: If the body of e is an expression, the type of that expression is the inferred return type of e. If the body of e is a statement block, if the set of expressions in the block's return statements has a best common type,

and if that type is not the null type, then that type is the inferred return type of L. Otherwise, a return type cannot be inferred for L.

As an example of type inference involving lambda expressions, consider the Select extension method declared in the System.Linq.Enumerable class:

Copy Codenamespace System.Linq{ public static class Enumerable { public static IEnumerable<TResult> Select<TSource,TResult>(

© iTech Connect K. Feroz 27

Page 28: 26.3 Lambda Expressions

this IEnumerable<TSource> source, Func<TSource,TResult> selector) { foreach (TSource element in source) yield return selector(element); } }}

Assuming the System.Linq namespace was imported with a using clause, and given a class Customer with a Name property of type string, the Select method can be used to select the names of a list of customers:

Copy CodeList<Customer> customers = GetCustomerList();IEnumerable<string> names = customers.Select(c => c.Name);

The extension method invocation (§26.2.3) of Select is processed by rewriting the invocation to a static method invocation:

Copy CodeIEnumerable<string> names = Enumerable.Select(customers, c => c.Name);Since type arguments were not explicitly specified, type inference is used to infer the type arguments. First, the customers argument is related to the source parameter, inferring T to be Customer. Then, using the lambda expression type inference process described above, c is given type Customer, and the expression c.Name is related to the return type of the selector parameter, inferring S to be string. Thus, the invocation is equivalent to:

Copy CodeSequence.Select<Customer,string>(customers, (Customer c) => c.Name)

and the result is of type IEnumerable<string>.

The following example demonstrates how lambda expression type inference allows type information to "flow" between arguments in a generic method invocation. Given the method:

Copy Codestatic Z F<X,Y,Z>(X value, Func<X,Y> f1, Func<Y,Z> f2) { return f2(f1(value));}

type inference for the invocation

Copy Codedouble seconds = F("1:15:30", s => TimeSpan.Parse(s), t => t.TotalSeconds);proceeds as follows: First, the argument "1:15:30" is related to the value parameter, inferring X to be string. Then, the parameter of the first lambda expression, s, is given the inferred type string, and the expression TimeSpan.Parse(s) is related to the return type of f1, inferring Y to be System.TimeSpan. Finally, the parameter of the second lambda expression, t, is given the inferred type System.TimeSpan, and the expression t.TotalSeconds is related to the return type of f2, inferring Z to be double. Thus, the result of the invocation is of type double.

26.3.3.12 Type inference for conversion of method groups

Similar to calls of generic methods, type inference must also be applied when a method group M containing a generic method is assigned to a given delegate type D. Given a method

Copy CodeTr M<X1...Xn>(T1 x1 ... Tm xm)

© iTech Connect K. Feroz 28

Page 29: 26.3 Lambda Expressions

and the method group M being assigned to the delegate type D the task of type inference is to find type arguments S1...Sn so that the expression:

Copy CodeM<S1...Sn>

becomes assignable to D.

Unlike the type inference algorithm for generic method calls, in this case there are only argument types, no argument expressions. In particular, there are no lambda expressions and hence no need for multiple phases of inference.

Instead, all Xi are considered unfixed, and a lower-bound inference is made from each argument type Uj of D to the corresponding parameter type Tj of M. If for any of the Xi no bounds were found, type inference fails. Otherwise, all Xi are fixed to corresponding Si, which are the result of type inference.

26.3.3.13 Finding the best common type of a set of expressions

In some cases, a common type needs to be inferred for a set of expressions. In particular, the element types of implicitly typed arrays and the return types of anonymous methods and statement lambdas are found in this way.

Intuitively, given a set of expressions e1...em this inference should be equivalent to calling a method

Copy CodeTr M<X>(X x1 ... X xm)

with the ei as arguments.

More precisely, the inference starts out with an unfixed type variable X. Output type inferences are then made from each ei with type X. Finally, X is fixed and the resulting type S is the resulting common type for the expressions.

26.3.4 Overload ResolutionLambda expressions in an argument list affect overload resolution in certain situations. Please refer to §7.4.2.3 for the exact rules.

The following example illustrates the effect of lambdas on overload resolution.

Copy Codeclass ItemList<T>: List<T>{ public int Sum(Func<T,int> selector) { int sum = 0; foreach (T item in this) sum += selector(item); return sum; } public double Sum(Func<T,double> selector) { double sum = 0; foreach (T item in this) sum += selector(item); return sum; }}The ItemList<T> class has two Sum methods. Each takes a selector argument, which extracts the value to sum over from a list item. The extracted value can be either an int or a double and the resulting sum is likewise either an int or a double.

The Sum methods could for example be used to compute sums from a list of detail lines in an order.

Copy Codeclass Detail

© iTech Connect K. Feroz 29

Page 30: 26.3 Lambda Expressions

{ public int UnitCount; public double UnitPrice; ...}void ComputeSums() { ItemList<Detail> orderDetails = GetOrderDetails(...); int totalUnits = orderDetails.Sum(d => d.UnitCount); double orderTotal = orderDetails.Sum(d => d.UnitPrice * d.UnitCount); ...}

In the first invocation of orderDetails.Sum, both Sum methods are applicable because the lambda expression d => d.UnitCount is compatible with both Func<Detail,int> and Func<Detail,double>. However, overload resolution picks the first Sum method because the conversion to Func<Detail,int> is better than the conversion to Func<Detail,double>.

In the second invocation of orderDetails.Sum, only the second Sum method is applicable because the lambda expression d => d.UnitPrice * d.UnitCount produces a value of type double. Thus, overload resolution picks the second Sum method for that invocation.

26.4 Object and Collection InitializersAn object creation expression (§7.5.10.1) may include an object or collection initializer which initializes the members of the newly created object or the elements of the newly created collection.

Copy Codeobject-creation-expression:new type ( argument-listopt ) object-or-collection-initializeropt new type object-or-collection-initializerobject-or-collection-initializer:object-initializercollection-initializer

An object creation expression can omit the constructor argument list and enclosing parentheses provided it includes an object or collection initializer. Omitting the constructor argument list and enclosing parentheses is equivalent to specifying an empty argument list.

Execution of an object creation expression that includes an object or collection initializer consists of first invoking the instance constructor and then performing the member or element initializations specified by the object or collection initializer.

It is not possible for an object or collection initializer to refer to the object instance being initialized.

In order to correctly parse object and collection initializers with generics, the disambiguating list of tokens in §20.6.5 must be augmented with the } token.

26.4.1 Object Initializers

An object initializer specifies values for one or more fields or properties of an object.

Copy Codeobject-initializer:{ member-initializer-listopt }{ member-initializer-list , }

© iTech Connect K. Feroz 30

Page 31: 26.3 Lambda Expressions

member-initializer-list:member-initializermember-initializer-list , member-initializermember-initializer:identifier = initializer-valueinitializer-value:expressionobject-or-collection-initializer

An object initializer consists of a sequence of member initializers, enclosed by { and } tokens and separated by commas. Each member initializer must name an accessible field or property of the object being initialized, followed by an equals sign and an expression or an object or collection initializer. It is an error for an object initializer to include more than one member initializer for the same field or property. It is not possible for the object initializer to refer to the newly created object it is initializing.

A member initializer that specifies an expression after the equals sign is processed in the same way as an assignment (§7.13.1) to the field or property.

A member initializer that specifies an object initializer after the equals sign is a nested object initializer, i.e., an initialization of an embedded object. Instead of assigning a new value to the field or property, the assignments in the nested object initializer are treated as assignments to members of the field or property. Nested object initializers cannot be applied to properties with a value type, or to read-only fields with a value type.

A member initializer that specifies a collection initializer after the equals sign is an initialization of an embedded collection. Instead of assigning a new collection to the field or property, the elements given in the initializer are added to the collection referenced by the field or property. The field or property must be of a collection type that satisfies the requirements specified in §26.4.2.

The following class represents a point with two coordinates:

Copy Codepublic class Point{ int x, y; public int X { get { return x; } set { x = value; } } public int Y { get { return y; } set { y = value; } }}An instance of Point can be created and initialized as follows:

Copy Codevar a = new Point { X = 0, Y = 1 };

which has the same effect as

Copy Codevar __a = new Point();__a.X = 0;__a.Y = 1; var a = __a;

where __a is an otherwise invisible and inaccessible temporary variable. The following class represents a rectangle created from two points:

Copy Codepublic class Rectangle{

© iTech Connect K. Feroz 31

Page 32: 26.3 Lambda Expressions

Point p1, p2; public Point P1 { get { return p1; } set { p1 = value; } } public Point P2 { get { return p2; } set { p2 = value; } }}

An instance of Rectangle can be created and initialized as follows:

Copy Codevar r = new Rectangle { P1 = new Point { X = 0, Y = 1 }, P2 = new Point { X = 2, Y = 3 }};

which has the same effect as

Copy Codevar __r = new Rectangle();var __p1 = new Point();__p1.X = 0;__p1.Y = 1;__r.P1 = __p1;var __p2 = new Point();__p2.X = 2;__p2.Y = 3;__r.P2 = __p2; var r = __r;

where __r, __p1 and __p2 are temporary variables that are otherwise invisible and inaccessible.

If the Rectangle constructor allocates the two embedded Point instances

Copy Codepublic class Rectangle{ Point p1 = new Point(); Point p2 = new Point(); public Point P1 { get { return p1; } } public Point P2 { get { return p2; } }}the following construct can be used to initialize the embedded Point instances instead of assigning new instances:

Copy Codevar r = new Rectangle { P1 = { X = 0, Y = 1 }, P2 = { X = 2, Y = 3 }};

which has the same effect as

© iTech Connect K. Feroz 32

Page 33: 26.3 Lambda Expressions

Copy Codevar __r = new Rectangle();__r.P1.X = 0;__r.P1.Y = 1;__r.P2.X = 2;__r.P2.Y = 3;var r = __r;

26.4.2 Collection Initializers

A collection initializer specifies the elements of a collection.

Copy Codecollection-initializer:{ element-initializer-list }{ element-initializer-list , }element-initializer-list:element-initializerelement-initializer-list , element-initializerelement-initializer:non-assignment-expression{ expression-list }

A collection initializer consists of a sequence of element initializers, enclosed by { and } tokens and separated by commas. Each element initializer specifies an element to be added to the collection object being initialized, and consists of a list of expressions enclosed by { and } tokens and separated by commas. A single-expression element initializer can be written without braces, but cannot then be an assignment expression, to avoid ambiguity with member initializers. The non-assignment-expression production is defined in §26.3.

The following is an example of an object creation expression that includes a collection initializer:

Copy CodeList<int> digits = new List<int> { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };

The collection object to which a collection initializer is applied must be of a type that implements System.Collections.IEnumerable or a compile-time error occurs. For each specified element in order, the collection initializer invokes the Add method on the target object with the expression list of the element initializer, applying normal overload resolution for each invocation.

The following class represents a contact with a name and a list of phone numbers:

Copy Codepublic class Contact{ string name; List<string> phoneNumbers = new List<string>(); public string Name { get { return name; } set { name = value; } } public List<string> PhoneNumbers { get { return phoneNumbers; } }}

A List<Contact> can be created and initialized as follows:

© iTech Connect K. Feroz 33

Page 34: 26.3 Lambda Expressions

Copy Codevar contacts = new List<Contact> { new Contact { Name = "Chris Smith", PhoneNumbers = { "206-555-0101", "425-882-8080" } }, new Contact { Name = "Bob Harris", PhoneNumbers = { "650-555-0199" } }};

which has the same effect as

Copy Codevar contacts = new List<Contact>();var __c1 = new Contact();__c1.Name = "Chris Smith";__c1.PhoneNumbers.Add("206-555-0101");__c1.PhoneNumbers.Add("425-882-8080");contacts.Add(__c1);var __c2 = new Contact();__c2.Name = "Bob Harris";__c2.PhoneNumbers.Add("650-555-0199");contacts.Add(__c2);

where __c1 and __c2 are temporary variables that are otherwise invisible and inaccessible.

26.5 Anonymous TypesC# 3.0 permits the new operator to be used with an anonymous object initializer to create an object of an anonymous type.

Copy Codeprimary-no-array-creation-expression:...anonymous-object-creation-expressionanonymous-object-creation-expression:new anonymous-object-initializeranonymous-object-initializer:{ member-declarator-listopt }{ member-declarator-list , }member-declarator-list:member-declaratormember-declarator-list , member-declarator

© iTech Connect K. Feroz 34

Page 35: 26.3 Lambda Expressions

member-declarator:simple-namemember-accessidentifier = expressionAn anonymous object initializer declares an anonymous type and returns an instance of that type. An anonymous type is a nameless class type that inherits directly from object. The members of an anonymous type are a sequence of read/write properties inferred from the object initializer(s) used to create instances of the type. Specifically, an anonymous object initializer of the form

Copy Codenew { p1 = e1 , p2 = e2 , ... pn = en }

declares an anonymous type of the form

Copy Codeclass __Anonymous1{ private T1 f1 ; private T2 f2 ; ... private Tn fn ; public T1 p1 { get { return f1 ; } set { f1 = value ; } } public T2 p2 { get { return f2 ; } set { f2 = value ; } } ... public T1 p1 { get { return f1 ; } set { f1 = value ; } }}

where each Tx is the type of the corresponding expression ex. It is a compile-time error for an expression in an anonymous object initializer to be of the null type or an unsafe type.

The name of an anonymous type is automatically generated by the compiler and cannot be referenced in program text.

Within the same program, two anonymous object initializers that specify a sequence of properties of the same names and compile-time types in the same order will produce instances of the same anonymous type. (This definition includes the order of the properties because it is observable and material in certain circumstances, such as reflection).

In the example

Copy Codevar p1 = new { Name = "Lawnmower", Price = 495.00 };var p2 = new { Name = "Shovel", Price = 26.95 };p1 = p2;

the assignment on the last line is permitted because p1 and p2 are of the same anonymous type.

The Equals and GetHashcode methods on anonymous types are defined in terms of the Equals and GetHashcode of the properties, so that two instances of the same anonymous type are equal if and only if all their properties are equal.

A member declarator can be abbreviated to a simple name (§7.5.2) or a member access (§7.5.4). This is called a projection initializer and is shorthand for a declaration of and assignment to a property with the same name. Specifically, member declarators of the forms

Copy Codeidentifier expr . identifier

© iTech Connect K. Feroz 35

Page 36: 26.3 Lambda Expressions

are precisely equivalent to the following, respectively:

Copy Codeidentifer = identifier identifier = expr . identifier

Thus, in a projection initializer the identifier selects both the value and the field or property to which the value is assigned. Intuitively, a projection initializer projects not just a value, but also the name of the value.

26.6 Implicitly Typed ArraysThe syntax of array creation expressions (§7.5.10.2) is extended to support implicitly typed array creation expressions:

Copy Codearray-creation-expression:...new [ ] array-initializer

In an implicitly typed array creation expression, the type of the array instance is inferred from the elements specified in the array initializer. Specifically, the set formed by the types of the expressions in the array initializer must contain exactly one type to which each type in the set is implicitly convertible, and if that type is not the null type, an array of that type is created. If exactly one type cannot be inferred, or if the inferred type is the null type, a compile-time error occurs.

The following are examples of implicitly typed array creation expressions:

Copy Codevar a = new[] { 1, 10, 100, 1000 }; // int[]var b = new[] { 1, 1.5, 2, 2.5 }; // double[]var c = new[] { "hello", null, "world” }; // string[]var d = new[] { 1, "one", 2, "two" }; // Error

The last expression causes a compile-time error because neither int nor string is implicitly convertible to the other. An explicitly typed array creation expression must be used in this case, for example specifying the type to be object[]. Alternatively, one of the elements can be cast to a common base type, which would then become the inferred element type.

Implicitly typed array creation expressions can be combined with anonymous object initializers to create anonymously typed data structures. For example:

Copy Codevar contacts = new[] { new { Name = "Chris Smith", PhoneNumbers = new[] { "206-555-0101", "425-882-8080" } }, new { Name = "Bob Harris", PhoneNumbers = new[] { "650-555-0199" } }};

26.7 Query Expressions

© iTech Connect K. Feroz 36

Page 37: 26.3 Lambda Expressions

Query expressions provide a language integrated syntax for queries that is similar to relational and hierarchical query languages such as SQL and XQuery.

Copy Codequery-expression:from-clause query-bodyfrom-clause:from typeopt identifier in expressionquery-body:query-body-clausesopt select-or-group-clause query-continuationoptquery-body-clauses:query-body-clausequery-body-clauses query-body-clausequery-body-clause:from-clauselet-clausewhere-clausejoin-clausejoin-into-clauseorderby-clauselet-clause:let identifier = expressionwhere-clause:where boolean-expressionjoin-clause:join typeopt identifier in expression on expression equals expression join-into-clause:join typeopt identifier in expression on expression equals expression into identifierorderby-clause:orderby orderingsorderings:orderingorderings , orderingordering:expression ordering-directionoptordering-direction:ascendingdescendingselect-or-group-clause:select-clausegroup-clauseselect-clause:

© iTech Connect K. Feroz 37

Page 38: 26.3 Lambda Expressions

select expressiongroup-clause:group expression by expressionquery-continuation:into identifier query-body

A query-expression is classified as a non-assignment-expression, the definition of which occurs in §26.3.

A query expression begins with a from clause and ends with either a select or group clause. The initial from clause can be followed by zero or more from, let, where or join clauses. Each from clause is a generator introducing a range variable ranging over a sequence. Each let clause computes a value and introduces an identifier representing that value. Each where clause is a filter that excludes items from the result. Each join clause compares specified keys of the source sequence with keys of another sequence, yielding matching pairs. Each orderby clause reorders items according to specified criteria. The final select or group clause specifies the shape of the result in terms of the range variable(s). Finally, an into clause can be used to "splice" queries by treating the results of one query as a generator in a subsequent query.

Ambiguities in query expressions

Query expressions contain a number of new contextual keywords, i.e., identifiers that have special meaning in a given context. Specifically these are: from, join, on, equals, into, let, orderby, ascending, descending, select, group and by. In order to avoid ambiguities caused by mixed use of these identifiers as keywords or simple names in query expressions, they are considered keywords anywhere within a query expression.

For this purpose, a query expression is any expression starting with "from identifier" followed by any token except ";", "=" or ",".

In order to use these words as identifiers within a query expression, they can be prefixed with "@" (§2.4.2).

26.7.1 Query Expression TranslationThe C# 3.0 language does not specify the exact execution semantics of query expressions. Rather, C# 3.0 translates query expressions into invocations of methods that adhere to the query expression pattern. Specifically, query expressions are translated into invocations of methods named Where, Select, SelectMany, Join, GroupJoin, OrderBy, OrderByDescending, ThenBy, ThenByDescending, GroupBy, and Cast that are expected to have particular signatures and result types, as described in §26.7.2. These methods can be instance methods of the object being queried or extension methods that are external to the object, and they implement the actual execution of the query.

The translation from query expressions to method invocations is a syntactic mapping that occurs before any type binding or overload resolution has been performed. The translation is guaranteed to be syntactically correct, but it is not guaranteed to produce semantically correct C# code. Following translation of query expressions, the resulting method invocations are processed as regular method invocations, and this may in turn uncover errors, for example if the methods do not exist, if arguments have wrong types, or if the methods are generic and type inference fails.

A query expression is processed by repeatedly applying the following translations until no further reductions are possible. The translations are listed in order of precedence: each section assumes that the translations in the preceding sections have been performed exhaustively.

Certain translations inject range variables with transparent identifiers denoted by *. The special properties of transparent identifiers are discussed further in §26.7.1.7.

26.7.1.1 Select and groupby clauses with continuations

A query expression with a continuation

Copy Codefrom ... into x ...

is translated into

Copy Codefrom x in ( from ... ) ...

The translations in the following sections assume that queries have no into continuations.

The example

© iTech Connect K. Feroz 38

Page 39: 26.3 Lambda Expressions

Copy Codefrom c in customersgroup c by c.Country into gselect new { Country = g.Key, CustCount = g.Count() }

is translated into

Copy Codefrom g in from c in customers group c by c.Countryselect new { Country = g.Key, CustCount = g.Count() }

the final translation of which is

Copy Codecustomers.GroupBy(c => c.Country).Select(g => new { Country = g.Key, CustCount = g.Count() })

26.7.1.2 Explicit range variable types

A from clause that explicitly specifies a range variable type

Copy Codefrom T x in e

is translated into

Copy Codefrom x in ( e ) . Cast < T > ( )

A join clause that explicitly specifies a range variable type

Copy Codejoin T x in e on k1 equals k2

is translated into

Copy Codejoin x in ( e ) . Cast < T > ( ) on k1 equals k2

The translations in the following sections assume that queries have no explicit range variable types.

The example

Copy Codefrom Customer c in customerswhere c.City == "London"select c

is translated into

© iTech Connect K. Feroz 39

Page 40: 26.3 Lambda Expressions

Copy Codefrom c in customers.Cast<Customer>()where c.City == "London"select c

the final translation of which is

Copy Codecustomers.Cast<Customer>().Where(c => c.City == "London")

Explicit range variable types are useful for querying collections that implement the non-generic IEnumerable interface, but not the generic IEnumerable<T> interface. In the example above, this would be the case if customers were of type ArrayList.

26.7.1.3 Degenerate query expressions

A query expression of the form

Copy Codefrom x in e select x

is translated into

Copy Code( e ) . Select ( x => x )

The example

Copy Codefrom c in customersselect c

Is translated into

Copy Codecustomers.Select(c => c)

A degenerate query expression is one that trivially selects the elements of the source. A later phase of the translation removes degenerate queries introduced by other translation steps by replacing them with their source. It is important however to ensure that the result of a query expression is never the source object itself, as that would reveal the type and identity of the source to the client of the query. Therefore this step protects degenerate queries written directly in source code by explicitly calling Select on the source. It is then up to the implementers of Select and other query operators to ensure that these methods never return the source object itself.

26.7.1.4 From, let, where, join and orderby clauses

A query expression with a second from clause followed by a select clause

Copy Codefrom x1 in e1

from x2 in e2

select v

is translated into

© iTech Connect K. Feroz 40

Page 41: 26.3 Lambda Expressions

Copy Code( e1 ) . SelectMany( x1 => e2 , ( x1 , x2 ) => v )

A query expression with a second from clause followed by something other than a select clause:

Copy Codefrom x1 in e1

from x2 in e2

...

is translated into

Copy Codefrom * in ( e1 ) . SelectMany( x1 => e2 , ( x1 , x2 ) => new { x1 , x2 } )...

A query expression with a let clause

Copy Codefrom x in elet y = f...

is translated into

Copy Codefrom * in ( e ) . Select ( x => new { x , y = f } )...

A query expression with a where clause

Copy Codefrom x in ewhere f...

is translated into

Copy Codefrom x in ( e ) . Where ( x => f )...A query expression with a join clause without an into followed by a select clause

Copy Codefrom x1 in e1

join x2 in e2 on k1 equals k2

select v

is translated into

Copy Code( e1 ) . Join( e2 , x1 => k1 , x2 => k2 , ( x1 , x2 ) => v )

© iTech Connect K. Feroz 41

Page 42: 26.3 Lambda Expressions

A query expression with a join clause without an into followed by something other than a select clauseCopy Code

from x1 in e1

join x2 in e2 on k1 equals k2 ...

is translated into

Copy Codefrom * in ( e1 ) . Join( e2 , x1 => k1 , x2 => k2 , ( x1 , x2 ) => new { x1 , x2 })...

A query expression with a join clause with an into followed by a select clause

Copy Codefrom x1 in e1

join x2 in e2 on k1 equals k2 into gselect v

is translated into

Copy Code( e1 ) . GroupJoin( e2 , x1 => k1 , x2 => k2 , ( x1 , g ) => v )A query expression with a join clause with an into followed by something other than a select clause

Copy Codefrom x1 in e1

join x2 in e2 on k1 equals k2 into g...

is translated into

Copy Codefrom * in ( e1 ) . GroupJoin( e2 , x1 => k1 , x2 => k2 , ( x1 , g ) => new { x1 , g })...

A query expression with an orderby clause

Copy Codefrom x in eorderby k1 , k2 , ... , kn...

is translated into

Copy Codefrom x in ( e ) . OrderBy ( x => k1 ) . ThenBy ( x => k2 ) . ... .

© iTech Connect K. Feroz 42

Page 43: 26.3 Lambda Expressions

ThenBy ( x => kn )...

If an ordering clause specifies a descending direction indicator, an invocation of OrderByDescending or ThenByDescending is produced instead.

The following translations assume that there are no let, where, join or orderby clauses, and no more than the one initial from clause in each query expression.

The example

Copy Codefrom c in customersfrom o in c.Ordersselect new { c.Name, o.OrderID, o.Total }

is translated into

Copy Codecustomers.SelectMany(c => c.Orders, (c,o) => new { c.Name, o.OrderID, o.Total })

The example

Copy Codefrom c in customersfrom o in c.Ordersorderby o.Total descendingselect new { c.Name, o.OrderID, o.Total }

is translated into

Copy Codefrom * in customers. SelectMany(c => c.Orders, (c,o) => new { c, o })orderby o.Total descendingselect new { c.Name, o.OrderID, o.Total }

the final translation of which is

Copy Codecustomers.SelectMany(c => c.Orders, (c,o) => new { c, o }).OrderByDescending(x => x.o.Total).Select(x => new { x.c.Name, x.o.OrderID, x.o.Total })

where x is a compiler generated identifier that is otherwise invisible and inaccessible.

The example

Copy Code© iTech Connect K. Feroz 43

Page 44: 26.3 Lambda Expressions

from o in orderslet t = o.Details.Sum(d => d.UnitPrice * d.Quantity)where t >= 1000select new { o.OrderID, Total = t }

is translated into

Copy Codefrom * in orders. Select(o => new { o, t = o.Details.Sum(d => d.UnitPrice * d.Quantity) })where t >= 1000 select new { o.OrderID, Total = t }

the final translation of which is

Copy Codeorders.Select(o => new { o, t = o.Details.Sum(d => d.UnitPrice * d.Quantity) }).Where(x => x.t >= 1000).Select(x => new { x.o.OrderID, Total = x.t })

where x is a compiler generated identifier that is otherwise invisible and inaccessible.

The example

Copy Codefrom c in customersjoin o in orders on c.CustomerID equals o.CustomerIDselect new { c.Name, o.OrderDate, o.Total }

is translated into

Copy Codecustomers.Join(orders, c => c.CustomerID, o => o.CustomerID, (c, o) => new { c.Name, o.OrderDate, o.Total })

The example

Copy Codefrom c in customersjoin o in orders on c.CustomerID equals o.CustomerID into colet n = co.Count()where n >= 10select new { c.Name, OrderCount = n }

is translated into

Copy Codefrom * in customers. GroupJoin(orders, c => c.CustomerID, o => o.CustomerID,

© iTech Connect K. Feroz 44

Page 45: 26.3 Lambda Expressions

(c, co) => new { c, co })let n = co.Count()where n >= 10 select new { c.Name, OrderCount = n }

the final translation of which is

Copy Codecustomers.GroupJoin(orders, c => c.CustomerID, o => o.CustomerID, (c, co) => new { c, co }).Select(x => new { x, n = x.co.Count() }).Where(y => y.n >= 10).Select(y => new { y.x.c.Name, OrderCount = y.n)

where x and y are compiler generated identifiers that are otherwise invisible and inaccessible.

The example

Copy Codefrom o in ordersorderby o.Customer.Name, o.Total descendingselect o

has the final translation

Copy Codeorders.OrderBy(o => o.Customer.Name).ThenByDescending(o => o.Total)

26.7.1.5 Select clauses

A query expression of the form

Copy Codefrom x in e select v

is translated into

Copy Code( e ) . Select ( x => v )

except when v is the identifier x, the translation is simply

Copy Code

( e )

For example

Copy Code

© iTech Connect K. Feroz 45

Page 46: 26.3 Lambda Expressions

from c in customers.Where(c => c.City == "London")select c

is simply translated into

Copy Codecustomers.Where(c => c.City == "London")

26.7.1.6 Groupby clauses

A query expression of the form

Copy Codefrom x in e group v by k

is translated into

Copy Code( e ) . GroupBy ( x => k , x => v )

except when v is the identifier x, the translation is

Copy Code( e ) . GroupBy ( x => k )

The example

Copy Codefrom c in customersgroup c.Name by c.Country

is translated into

Copy Codecustomers.GroupBy(c => c.Country, c => c.Name)

26.7.1.7 Transparent identifiers

Certain translations inject range variables with transparent identifiers denoted by *. Transparent identifiers are not a proper language feature; they exist only as an intermediate step in the query expression translation process.

When a query translation injects a transparent identifier, further translation steps propagate the transparent identifier into lambda expressions and anonymous object initializers. In those contexts, transparent identifiers have the following behavior:

When a transparent identifier occurs as a parameter in a lambda expression, the members of the associated anonymous type are automatically in scope in the body of the lambda expression.

When a member with a transparent identifier is in scope, its members are in scope as well. When a transparent identifier occurs as a member declarator in an anonymous object initializer, it introduces a member

with a transparent identifier.

The example

Copy Codefrom c in customers

© iTech Connect K. Feroz 46

Page 47: 26.3 Lambda Expressions

from o in c.Ordersorderby o.Total descendingselect new { c.Name, o.Total }

is translated into

Copy Codefrom * in from c in customers from o in c.Orders select new { c, o }orderby o.Total descendingselect new { c.Name, o.Total }

which is further translated into

Copy Codecustomers.SelectMany(c => c.Orders.Select(o => new { c, o })).OrderByDescending(* => o.Total).Select(* => new { c.Name, o.Total })

which, when transparent identifiers are erased, is equivalent to

Copy Codecustomers.SelectMany(c => c.Orders.Select(o => new { c, o })).OrderByDescending(x => x.o.Total).Select(x => new { x.c.Name, x.o.Total })

where x is a compiler generated identifier that is otherwise invisible and inaccessible.

The example

Copy Codefrom c in customersjoin o in orders on c.CustomerID equals o.CustomerIDjoin d in details on o.OrderID equals d.OrderIDjoin p in products on d.ProductID equals p.ProductIDselect new { c.Name, o.OrderDate, p.ProductName }

is translated into

Copy Codefrom * in from * in from * in from c in customers

© iTech Connect K. Feroz 47

Page 48: 26.3 Lambda Expressions

join o in orders o c.CustomerID equals o.CustomerID select new { c, o } join d in details on o.OrderID equals d.OrderID select new { *, d } join p in products on d.ProductID equals p.ProductID select new { *, p }select new { c.Name, o.OrderDate, p.ProductName }

which is further reduced to

Copy Codecustomers.Join(orders, c => c.CustomerID, o => o.CustomerID, (c, o) => new { c, o }).Join(details, * => o.OrderID, d => d.OrderID, (*, d) => new { *, d }).Join(products, * => d.ProductID, p => p.ProductID, (*, p) => new { *, p }).Select(* => new { c.Name, o.OrderDate, p.ProductName })

the final translation of which is

Copy Codecustomers.Join(orders, c => c.CustomerID, o => o.CustomerID, (c, o) => new { c, o }).Join(details, x => x.o.OrderID, d => d.OrderID, (x, d) => new { x, d }).Join(products, y => y.d.ProductID, p => p.ProductID, (y, p) => new { y, p }).Select(z => new { z.y.x.c.Name, z.y.x.o.OrderDate, z.p.ProductName })

where x, y, and z are compiler generated identifiers that are otherwise invisible and inaccessible.

26.7.2 The Query Expression PatternThe Query Expression Pattern establishes a pattern of methods that types can implement to support query expressions. Because query expressions are translated to method invocations by means of a syntactic mapping, types have considerable flexibility in how they implement the query expression pattern. For example, the methods of the pattern can be implemented as instance methods or as extension methods because the two have the same invocation syntax, and the methods can request delegates or expression trees because lambda expressions are convertible to both.

The recommended shape of a generic type C<T> that supports the query expression pattern is shown below. A generic type is used in order to illustrate the proper relationships between parameter and result types, but it is possible to implement the pattern for non-generic types as well.

Copy Codedelegate R Func<T1,R>(T1 arg1);delegate R Func<T1,T2,R>(T1 arg1, T2 arg2);

© iTech Connect K. Feroz 48

Page 49: 26.3 Lambda Expressions

class C{ public C<T> Cast<T>();}class C<T>{ public C<T> Where(Func<T,bool> predicate); public C<U> Select<U>(Func<T,U> selector); public C<U> SelectMany<U,V>(Func<T,C<U>> selector, Func<T,U,V> resultSelector); public C<V> Join<U,K,V>(C<U> inner, Func<T,K> outerKeySelector, Func<U,K> innerKeySelector, Func<T,U,V> resultSelector); public C<V> GroupJoin<U,K,V>(C<U> inner, Func<T,K> outerKeySelector, Func<U,K> innerKeySelector, Func<T,C<U>,V> resultSelector); public O<T> OrderBy<K>(Func<T,K> keySelector); public O<T> OrderByDescending<K>(Func<T,K> keySelector); public C<G<K,T>> GroupBy<K>(Func<T,K> keySelector); public C<G<K,E>> GroupBy<K,E>(Func<T,K> keySelector, Func<T,E> elementSelector);}class O<T> : C<T>{ public O<T> ThenBy<K>(Func<T,K> keySelector); public O<T> ThenByDescending<K>(Func<T,K> keySelector);}class G<K,T> : C<T>{ public K Key { get; }}

The methods above use the generic delegate types Func<T1, R> and Func<T1, T2, R>, but they could equally well have used other delegate or expression tree types with the same relationships in parameter and result types.

Notice the recommended relationship between C<T> and O<T> which ensures that the ThenBy and ThenByDescending methods are available only on the result of an OrderBy or OrderByDescending. Also notice the recommended shape of the result of GroupBy—a sequence of sequences, where each inner sequence has an additional Key property.

The Standard Query Operators (described in a separate specification) provide an implementation of the query operator pattern for any type that implements the System.Collections.Generic.IEnumerable<T> interface.

26.8 Expression TreesExpression trees permit lambda expressions to be represented as data structures instead of executable code. A lambda expression that is convertible to a delegate type D is also convertible to an expression tree of type System.Query.Expression<D>. Whereas the conversion of a lambda expression to a delegate type causes executable code to be generated and referenced by a delegate, conversion to an expression tree type causes code that creates an expression tree instance to be emitted. Expression trees are efficient in-memory data representations of lambda expressions and make the structure of the expression transparent and explicit.

© iTech Connect K. Feroz 49

Page 50: 26.3 Lambda Expressions

The following example represents a lambda expression both as executable code and as an expression tree. Because a conversion exists to Func<int,int>, a conversion also exists to Expression<Func<int,int>>.

Copy CodeFunc<int,int> f = x => x + 1; // CodeExpression<Func<int,int>> e = x => x + 1; // Data

Following these assignments, the delegate f references a method that returns x + 1, and the expression tree e references a data structure that describes the expression x + 1.

26.8.1 Overload ResolutionFor the purpose of overload resolution there are special rules regarding the Expression<D> types. Specifically the following rule is added to the definition of betterness:

Expression<D1> is better than Expression<D2> if and only if D1 is better than D2

Note   that there is no betterness rule between Expression<D> and delegate types.

26.9 Automatically Implemented PropertiesOftentimes properties are implemented by trivial use of a backing field, as in the following example:

Copy Codepublic Class Point { private int x; private int y; public int X { get { return x; } set { x = value; } } public int Y { get { return y; } set { y = value; } }}

Automatically implemented (auto-implemented) properties automate this pattern. More specifically, non-abstract property declarations are allowed to have semicolon accessor bodies. Both accessors must be present and both must have semicolon bodies, but they can have different accessibility modifiers. When a property is specified like this, a backing field will automatically be generated for the property, and the accessors will be implemented to read from and write to that backing field. The name of the backing field is compiler generated and inaccessible to the user.

The following declaration is equivalent to the example above:

Copy Code public Class Point { public int X { get; set; } public int Y { get; set; }}

Because the backing field is inaccessible, it can be read and written only through the property accessors. This means that auto-implemented read-only or write-only properties do not make sense, and are disallowed. It is however possible to set the access level of each accessor differently. Thus, the effect of a read-only property with a private backing field can be mimicked like this:

Copy CodePublic class ReadOnlyPoint { public int X { get; private set; } public int Y { get; private set; } public ReadOnlyPoint(int x, int y) { X = x; Y = y; }

© iTech Connect K. Feroz 50

Page 51: 26.3 Lambda Expressions

}

This restriction also means that definite assignment of struct types with auto-implemented properties can only be achieved using the standard constructor of the struct, since assigning to the property itself requires the struct to be definitely assigned.

© iTech Connect K. Feroz 51