I am the host of .NET Rocks!, an Internet audio talk show for .NET developers online at www.franklins.net/dotnetrocks and msdn.microsoft.com/dotnetrocks. My co-host, Rory Blyth, (www.neopoleon.com) and I interview the movers and shakers in the .NET community. We now have over 88 shows archived online, and we broadcast a new show every Friday night from 7:30 PM to 8:45 PM, Eastern Time, and make them available for download (with podcasting support) on the following Monday.
Show number #86 was an interview with Visual Basic Product Manager, Jay Roxe. Before landing that role, he was Dev Lead on the Framework and Base Class Libraries, where he worked on some of the fundamental framework and BCL classes. Here is an excerpt from the first half of the show where he talked about that experience:
Carl: Now, a lot of people don't really know who you are, and so for many of our listeners this is an introduction to Jay. So, just lay a couple of classes in the framework that you've written on us to establish your studliness.
Jay: OK, well let me give you the history. I joined what is now the .NET Framework team, or the Common Language Runtime team, back in November of 1997. [This was] back when it was called Project Lightning, then it became COM+, then it became Project 42, then we had this nice little re-org that made it Project 21 ? we lost half the team.
And so, I wrote things like String and StringBulder, and I wrote the initial implementation, although I did not own it forever, all of the base types like Int [16, 32, and 64], and double, and all of those. I did some of the work on Object and was Dev Lead for the System.IO classes, the globalization, and a bunch of the collections work as well.
Rory: Hey Jay, you mind if I ask you a couple questions? I'm already curious about some things. First of all, and this was brought up at one of the MSDN events I did this week, why is String sealed? Note: for VB.NET programmers, Sealed = NotInheritable.
Jay: Because we do a lot of magic tricks in String to try and make sure we can optimize for things like comparisons, to make them as fast as we possibly can. So, we're stealing bits off of pointers and other things in there to mark things up. Just to give you an example, and I didn't know this when I started, but if a String has a hyphen or an apostrophe in it [then] it sorts differently than if it just has text in it, and the algorithm for sorting it if you have a hyphen or an apostrophe if you're doing globally-aware sorting is pretty complicated, so we actually mark whether or not the string has that type of behavior in it.
Rory: So, what you're saying is that in the string world, if you didn't seal String there would be a whole lot of room for wreaking a lot of havoc if people were trying to subclass it.
Jay: Exactly. It would change the entire layout of the object, so we wouldn't be able to play the tricks that we play that pick up speed.
Carl: What I'm really curious about, Jay, is the StringBuilder. It's magic! I mean, I show that to people and they can't believe the difference in speed.
Jay: Which difference in speed?
Carl: Well the classic demo that I do for people is that I use a for/next loop taking an Int32 i from 1 to 10,000, and I append i.ToString to a string variable using an &= operator. And it takes about a second on a 2Ghz machine. Dim Value As String
Dim i As Int32
For i = 1 To 10000
Value &= i.ToString
Then I put a zero on the Int32 and make it go to 100,000. Dim Value As String
Dim i As Int32
For i = 1 To 100000
Value &= i.ToString
And then it takes a long time. After a minute goes by, we're barely past 40,000. Then I do it with a StringBuilder and it's immeasurable. Milliseconds. Dim Value As String
Dim sb As _
Dim i As Int32
For = 1 To 10000
Value = sb.ToString
Jay: Because you're only actually using the String when you get out of the loop, right?
Carl: Right. So what's going on in there?
Jay: When you're appending i.Tostring, what we actually do is every time you run that we'll create the string "1" then we'll create the string "1 2" then we'll create the string "1 2 3," and although that probably would be useful for counting the number of dates you're going to get after being in the New York Times (.NET Rocks! was in the New York Times, Thursday, October 28th See http://www.shrinkster.com/1wj) it's probably not what the people are really looking for. [Everyone laughs.]
Carl: So, the new string is the complete string?
Jay: It's the complete string. String doesn't change. It's immutable. What you've actually done when you do the count from 1 to 10,000 is you've created 10,000 instances of String.
Carl: And you're actually using System.String...
Jay: Yes. You've only got a reference to one of the Strings, which means you've got 9,999 of them on the floor.
Carl: Ready for garbage collection.
Jay: Ready for garbage collection. But you've taken the time to create all of those strings, and you're up... what is it, you're probably at 37,000 characters give or take in the last string?
Jay: So that's a pretty big string by the time you get all done.
Carl: Now, why is it so slow when using a String?
Jay: Well, every time you create one of those you're doing a 37,000 character copy when you get toward the end. Now, every string is two bytes, so you're moving large amounts of memory and I'm sure one of your listeners who has a calculator in front of them can figure out that if you're moving 37,000 characters times 2 bytes times 10,000 strings...
Carl: Well, wait a minute. If you're using the StringBuilder, aren't you also copying?
Jay: No. That's actually the trick. The StringBuilder object itself points to a string, and we have some internal APIs on string that allow us to change that one string. So let's say you create the StringBuilder with just "1" in it, and then you add "2." Normally you would have created two string instances. We actually have a pointer to what at that point is a hidden string, where we can just create the string "1 2" without having to copy it.
Now, we have to do an occasional copy because when you get up to having 100 characters in the string or whatever, you have to grow the string, we have to grow the buffer, just like you would with any growable array. What that means is we only have to copy it when we're out of memory.
The trick is that whenever you ask us for a pointer to that string (with .ToString) we just go in and give you back that string object and mark it as dirty. So that if you, say, wanted to add again, at that point we create another copy of the string.
Carl: That's pretty tricky. But it's those internal APIs that make it so much faster than a straight copy then, right?
Jay: Yep. And any time you see a String, you can count on it being immutable. But we can take advantage of the fact that we know some things about strings to get you that higher performance.
Carl: Am I creating more object references in a string loop, versus only one reference with StringBuilder?
Jay: Well, actually creating the object reference itself is pretty cheap. We did a lot of benchmarking, because we know this is something people really care about. And, it turns out that most of the time is actually spent doing the copy. So even if you've got a pretty optimized memcopy, that's still a lot of memory moving around.
After the break we started talking about C# getting Edit and Continue in C# 2.0, and that led into a discussion of some more features of Visual Basic 2005.
Jay: C# users I'm sure are delighted to be getting Edit and Continue. If only they were getting My as well.
Rory: Yeah, but we can use My, can't we?
Jay: You can use parts of My. My is actually two things. There are sections of My that are shortcuts into the .NET Framework. So, things like My.Computer.FileSystem.ReadAllText is a shortcut into the Framework. So if you imported the Microsoft.VisualBasic namespace, you could go use that type of stuff from within C#, although you wouldn't have the keyword My at the top.
There are other things like Settings and Web Services and Resources that are generated on the fly in the background for you. So that stuff is specific to your project, and C# isn't going to get that part of My.
Rory: Wow! Oh, Crap! I was thinking the whole time that I could just set a reference and start using it. Well, while we're talking about that you want to talk about the refactoring story for Whidbey in VB?
Jay: Well, in 2005, VB is going to have the Rename symbol. We took a look at adding some more of the other refactorings because we've heard from people in the community that they'd love to have some more of them, and we just can't make it work.
Rory: So, what you're telling me is that C# developers get part of the My namespace and VB developers get some of the refactoring abilities of C#. Bam! Oh yeah! [laughs] Sorry. I felt one-upped there for a minute and I felt I had to kind of strike back a little bit. I'm a C# guy, and I was totally bummed about not getting the My namespace, but as long as we get something in there to make up for it...
Jay: We're actually working with a few partners [to do some of these things in VB.NET] because we know there are so many people out there who want it, and we want to add them in there in the next version of VB.
Carl: Let me ask you this. You said that you just couldn't do it. Do you mean that technically it wasn't possible to do the refactoring stuff? It seems odd to me.
Jay: Technically anything's possible. But it's a question of how do we deliver VB 2005 when everybody wants it. I've just come back from this VB user group tour where we've had people go to about 40 cities all around the world. And the main message that we get when we demo 2005 is "ship!" So, what's the cost in doing it in terms of the technical aspects, and also the overall schedule?
Rory: What you do have though is those code-snippet thingys.
Jay: Yeah, we're going to have about 500 of those [as well as IntelliSense expansions in C#]. We're actually doing a bunch more work to let you get to whatever type of IntelliSense expansion you want. So if you want the If loop written out for you, we're actually introducing the question mark syntax which is similar to what C# uses. The place people should go to check that out is the Visual Basic Team Blog at http://blogs.msdn.com/vbteam/.
Carl: You made a comment at a user group meeting that some things in the Microsoft.VisualBasic namespace are faster than their framework counterparts. Can you elaborate a little on that?
Jay: I think the thing we were talking about was CType in the Microsoft.VisualBasic namespace. CType is one of those things where we did a lot of work to make sure that it keeps the VB6 functionality because we know there are a lot of people that are used to that. [CType] does a lot more than, say, Convert.ToInt32 just in terms of the different types of parsings that it does.
So if you're trying to get that same type of behavior that you get with CType, it's easier to use CType than it is to actually go in and emulate that using the various .NET Framework calls that you'd have to make.
One of the things that everyone will notice about Visual Studio 2005 is just how much snappier performance has gotten around things like IDE startup and just the general behavior of the environment. I know that there are a couple of performance issues in the beta that we're working on, but people are going to see really stepped-up performance.
This was a really cool interview experience for me. It was great to finally understand how the StringBuilder works. Even though he didn't describe those "special APIs" I have a pretty good idea of what they do, and now so do you.
If you haven't listened to our show, please do. One word of warning, however. The shows between January 1, 2004 and November 15, 2004, contain adult-oriented humor. We started in January 2004 with Rory Blyth as the co-host and we delved into some comedy that had nothing to do with programming. Starting in November, 2004, .NET Rocks! is going back to its roots as an interview-only show. To fill our needs to be wacky, we started another show for a general geek-oriented audience called Mondays, which you can check out online at http://mondays.pwop.com.