SharePoint is a very powerful platform. It gives you a very easy-to-setup place to put your data in.

And you know what happens when you have a tool like SharePoint? People use it! And then when people have been putting in data, they want to retrieve it, in all sorts of weird ways. Putting in data is only half the story, and I’d argue the easier part. It is fetching the data in a meaningful and targeted manner that separates the wheat from chaff.

Business users can be amazing. They like to hit us with scenarios we could not have imagined in our wildest dreams when we thought we had all requirements figured out. Much to my chagrin, I have heard the following, “Oh yeah, that is a new requirement!”

It is thus important that the tools developers use allow them the agility and flexibility to satisfy such needs and changing requirements.

At the very heart of it, every system is basically data in and data out. Sure, standard adages apply, good architecture, garbage in, garbage out, etc. But SharePoint builds upon the experience of many years, and thus puts in place a number of things that you, the developer otherwise would have had to worry about. For instance, every piece of content that goes into SharePoint tracks who put it there, when, who edited it last, and when they edited it. Also, a few clicks away are things such as versioning, approve content etc. But in addition to such tracking information, sometimes, you need to fetch data out of a SharePoint installation that may or may not be so straightforward.

Retrieving Data Out of SharePoint

You can choose from many ways to pull data out of SharePoint. You can use the object model and get a hold of the SPList object, and run a for/each over SPListItems. Of course, that isn’t the smartest way to filter for data though. Filtering via iteration can be extremely resource expensive.

You could use search, even programmatically. You can do so by using the FullTextSqlQuery object as demonstrated in the following article at http://blah.winsmarts.com/2008-2-SharePoint_Search_ranking_rules_and_running_it_programatically.aspx. While that will provide you with results quickly, it may or may not provide you with accurate results, and this approach certainly has an external dependency on crawl schedules and algorithms. Given the nature of search, you also have limited control over the sorting mechanisms. The sorting is controlled generally by the rank, which is dependent on an algorithm that you can influence, but not fully control.

And then you have CAML, the Collaborative Application Markup Language. SharePoint uses CAML for many purposes, one of which is extracting the very data you need, and striking an excellent balance between speed, dependability, and accuracy.

Within SharePoint you’ll find a number of objects that use CAML that can help you query for data. In this article, I will look at them one by one.

The Lists Web Service

SharePoint comes with many Web services out of the box. Web services have the innate advantage of isolating atomic pieces of functionality and thus giving you better reliability and flexibility, a concept otherwise known as SOA (service oriented architecture).

Listing 1 shows an easy way to filter out all rows modified by a given user id, using the lists.asmx Web service.

Web services have their advantages, but performance and XmlSerialization isn’t one of them. So it is reasonable to expect that you have very rich support for CAML in the object model as well. At the heart of that you’ll find the SPQuery object.

The SPQuery Object

The SPQuery object allows you to specify a query in CAML syntax, and find the matching items using the SPList.GetItems method. For instance, you could easily use SPQuery to filter out all documents for a given user. You can see this demonstrated by the code shown in Listing 2.

This lets you retrieve a subset of items, matching a given criterion in your .NET code. If you wanted to do further lightweight filtering or sorting etc., you could simply covert the SPListItemCollection to a DataTable, and offer client-side filtering capabilities. Even better, you could use LINQ to sift through your data as shown below:

DataTable returnedItems =
items.GetDataTable();
IEnumerable<DataRow> selectedRows =
   from r in returnedItems.rows
   where r["Title"] =
      "My Announcement"
   select r;

An interesting thing to note about SPList.GetItems and SPQuery is that by default, SPList.GetItems will return you results in only one folder of a given list. If indeed you wanted to recurse the folders in a given list and identify all items matching the given criterion, you need to add the following line to your code:

query.ViewAttributes =
    "Scope=\"Recursive\"";

The SPSiteDataQuery Object

Well of course, now that you figured out how to recurse over folders, the business user coyishly hit you with another requirement.

Could you somehow retrieve all documents, modified or created by the given user, across the entire website? Doh!

Well, you know that the CAML query for such a requirement would look exactly like you have seen in Listing 1 and Listing 2.

But how can you possibly filter out all documents or list items across the entire site? Use the SPSiteDataQuery object as demonstrated in the code below.

using (
SPSite site =
new SPSite("http://moss2007"))
{
    SPWeb web = site.OpenWeb();
    SPSiteDataQuery query = new
       SPSiteDataQuery();
    query.Query = "…";
    DataTable results =
       web.GetSiteData(query);
}

In the code snippet above, I intentionally omitted the CAML query and replaced it with a “…”. SPSiteDataQuery expects to see the CAML query, without the <Query> element, and without any whitespace or line breaks. The query, thus, would look a bit like this, without the line breaks of course.

<Where><Or><Eq><FieldRef
  Name='Author'/><Value Type='User'>
  MOSS2007\\Administrator</Value>
  </Eq><Eq><FieldRef Name='Editor' />
  <Value Type='User'>
  MOSS2007\\Administrator</Value>
</Eq></Or></Where>

The PortalSiteMapProvider Object

SPQuery and SPSiteDataQuery perform quite well. But sometimes, quite well is just not good enough! To give you an idea of what I mean, the out-of-the-box views in SharePoint recommend that you stay under 2000 items under each container. A container could be a view or a folder. You see significant performance degradation beyond the 2000 mark. As an example, a list with 100K items may require up to 152000 milliseconds to fetch data using SPListItems and For Each. This is shortened significantly to about 2700 milliseconds using a simple Page_Load in a browser for the out-of-the-box views. It takes even less time (450 milliseconds) when using CAML and the lists.asmx Web service. In comparison, even search takes about 350 milliseconds to return the results-possibly inaccurately. Amazingly, SPQuery and SPList.GetItems will reduce this time to just over 200 milliseconds. 200 milliseconds to sift through 100K records is quite impressive.

But what if you had a million items?

At some point, in certain very rare situations, you need extreme performance. Performance that can be met by advanced techniques such as caching. The PortalSiteMapProvider provides exactly that. In the above scenario, the PortalSiteMapProvider will reduce the time required to a mere 5-10 milliseconds range for 100K records. Of course, due to caching, the downside of using this approach is much higher memory usage on the Web server. Thus at some point you will actually hurt overall performance by using PortalSiteMapProvider.

The following code demonstrates how to use the PortalSiteMapProvider object.

PortalSiteMapProvider ps =
    PortalSiteMapProvider.
    WebSiteMapProvider ;
SiteMapNodeCollection nodes =
    ps.GetCachedListItemsByQuery(
    (ps.FindSiteMapNode(
    web.ServerRelativeUrl) as
    PortalWebSiteMapNode
    ),
    "Announcements", query, web);

Figuring Out the CAML Search Syntax

It would be incredibly remiss of me to end this article without mentioning an indispensible tool that makes writing CAML queries a whole lot easier. Check out the U2U CAMLBuilder, available as a free download at http://www.u2u.info/SharePoint/U2U%20Community%20Tools/.

Using U2U CAMLBuilder is incredibly simple and through a point-and-click interface, it lets you craft up a parameterized CAML query, which you can then copy and paste in your .NET code. It really doesn’t get much easier than that.

Happy CAML riding!