學習如何避免10種最常見的C#誤區
About C#
C# is one of several languages that target the Microsoft Common Language Runtime (CLR). Languages that target the CLR benefit from features such as cross-language integration and exception handling, enhanced security, a simplified model for component interaction, and debugging and profiling services. Of today’s CLR languages, C# is the most widely used for complex, professional development projects that target the Windows desktop, mobile, or server environments.
C# is an object oriented, strongly-typed language. The strict type checking in C#, both at compile and run times, results in the majority of typical programming errors being reported as early as possible, and their locations pinpointed quite accurately. This can save the C# programmer a lot of time, compared to tracking down the cause of puzzling errors which can occur long after the offending operation takes place in languages which are more liberal with their enforcement of type safety. However, a lot of programmers unwittingly (or carelessly) throw away the benefits of this detection, which leads to some of the issues discussed in this C# tutorial.
About this Tutorial
This tutorial describes 10 of the most common programming mistakes made, or problems to be avoided, by C# programmers and provide them with help.
While most of the mistakes discussed in this article are C# specific, some are also relevant to other languages that target the CLR or make use of the
Common Mistake #1: Using a reference like a value or vice versa
Programmers of C++, and many other languages, are accustomed to being in control of whether the values they assign to variables are simply values or are references to existing objects. In C#, however, that decision is made by the programmer who wrote the object, not by the programmer who instantiates the object and assigns it to a variable. This is a common “gotcha” for newbie C# programmers.
If you don’t know whether the object you’re using is a value type or reference type, you could run into some surprises. For example:
Point point1 = new Point(20, 30);
Point point2 = point1;
point2.X = 50;
Console.WriteLine(point1.X); // 20 (does this surprise you?)
Console.WriteLine(point2.X); // 50
Pen pen1 = new Pen(Color.Black);
Pen pen2 = pen1;
pen2.Color = Color.Blue;
Console.WriteLine(pen1.Color); // Blue (or does this surprise you?)
Console.WriteLine(pen2.Color); // Blue
As you can see, both the Point
and Pen
objects
were created the exact same way, but the value of point1
remained
unchanged when a new X
coordinate
value was assigned to point2
,
whereas the value of pen1
was modified
when a new color was assigned to pen2
.
We can therefore deduce that point1
and point2
each
contain their own copy of a Point
object,
whereas pen1
and pen2
contain
references to the same Pen
object.But
how can we know that without doing this experiment?
The answer is to look at the definitions of the object types (which you can easily do in Visual Studio by placing your cursor over the name of the object type and pressing F12):
public struct Point { … } // defines a “value” type
public class Pen { … } // defines a “reference” type
As shown above, in C#, the struct
keyword
is used to define a value type, while the class
keyword
is used to define a reference type. For those with a C++ background, who were lulled into a false sense of security by the many similarities between C++
and C# keywords, this behavior likely comes as a surprise that may have you asking for help from a C# tutorial.
If you’re going to depend on some behavior which differs between value and reference types – such as the ability to pass an object as a method parameter and have that method change the state of the object – make sure that you’re dealing with the correct type of object to avoid C# problems.
Common Mistake #2: Misunderstanding default values for uninitialized variables
In C#, value types can’t be null. By definition, value types have a value, and even uninitialized variables of value types must have a value. This is called the default value for that type. This leads to the following, usually unexpected result when checking if a variable is uninitialized:
class Program {
static Point point1;
static Pen pen1;
static void Main(string[] args) {
Console.WriteLine(pen1 == null); // True
Console.WriteLine(point1 == null); // False (huh?)
}
}
Why isn’t point1
null?
The answer is that Point
is
a value type, and the default value for a Point
is
(0,0), not null. Failure to recognize this is a very easy (and common) mistake to make in C#.
Many (but not all) value types have an IsEmpty
property
which you can check to see if it is equal to its default value:
Console.WriteLine(point1.IsEmpty); // True
When you’re checking to see if a variable has been initialized or not, make sure you know what value an uninitialized variable of that type will have by default and don’t rely on it being null..
Common Mistake #3: Using improper or unspecified string comparison methods
There are many different ways to compare strings in C#.
Although many programmers use the ==
operator
for string comparison, it is actually one of the leastdesirable methods to employ, primarily because it doesn’t specify explicitly in the code
which type of comparison is wanted.
Rather, the preferred way to test for string equality in C# is with the Equals
method:
public bool Equals(string value);
public bool Equals(string value, StringComparison comparisonType);
The first method signature (i.e., without the comparisonType
parameter),
is actually the same as using the ==
operator,
but has the benefit of being explicitly applied to strings. It performs an ordinal comparison of the strings, which is basically a byte-by-byte comparison. In many cases this is exactly the type of comparison you want, especially when comparing strings whose
values are set programmatically, such as file names, environment variables, attributes, etc. In these cases, as long as an ordinal comparison is indeed the correct type of comparison for that situation, the only downside to using the Equals
method
without a comparisonType
is
that somebody reading the code may not know what type of comparison you’re making.
Using the Equals
method
signature that includes a comparisonType
every
time you compare strings, though, will not only make your code clearer, it will make you explicitly think about which type of comparison you need to make. This is a worthwhile thing to do, because even if English may not provide a whole lot of differences
between ordinal and culture-sensitive comparisons, other languages provide plenty, and ignoring the possibility of other languages is opening yourself up to a lot of potential for errors down the road. For example:
string s = "strasse";
// outputs False:
Console.WriteLine(s == "straße");
Console.WriteLine(s.Equals("straße"));
Console.WriteLine(s.Equals("straße", StringComparison.Ordinal));
Console.WriteLine(s.Equals("Straße", StringComparison.CurrentCulture));
Console.WriteLine(s.Equals("straße", StringComparison.OrdinalIgnoreCase));
// outputs True:
Console.WriteLine(s.Equals("straße", StringComparison.CurrentCulture));
Console.WriteLine(s.Equals("Straße", StringComparison.CurrentCultureIgnoreCase));
The safest practice is to always provide a comparisonType
parameter
to the Equals
method.
Here are some basic guidelines:
-
When comparing strings that were input by the user, or are to be displayed to the user, use a culture-sensitive comparison (
CurrentCulture
orCurrentCultureIgnoreCase
). -
When comparing programmatic strings, use ordinal comparison (
Ordinal
orOrdinalIgnoreCase
). -
InvariantCulture
andInvariantCultureIgnoreCase
are generally not to be used except in very limited circumstances, because ordinal comparisons are more efficient. If a culture-aware comparison is necessary, it should usually be performed against the current culture or another specific culture.
In addition to the Equals
method,
strings also provide the Compare
method,
which gives you information about the relative order of strings instead of just a test for equality. This method is preferable to the <
, <=
, >
and >=
operators,
for the same reasons as discussed above–to avoid C# problems.
Common Mistake #4: Using iterative (instead of declarative) statements to manipulate collections
In C# 3.0, the addition of Language-Integrated Query (LINQ) to the language changed forever the way collections are queried and manipulated. Since then, if you’re using iterative statements to manipulate collections, you didn’t use LINQ when you probably should have.
Some C# programmers don’t even know of LINQ’s existence, but fortunately that number is becoming increasingly small. Many still think, though, that because of the similarity between LINQ keywords and SQL statements, its only use is in code that queries databases.
While database querying is a very prevalent use of LINQ statements, they actually work over any enumerable collection (i.e., any object that implements the IEnumerable interface). So for example, if you had an array of Accounts, instead of writing:
decimal total = 0;
foreach (Account account in myAccounts) {
if (account.Status == "active") {
total += account.Balance;
}
}
you could just write:
decimal total = (from account in myAccounts
where account.Status == "active"
select account.Balance).Sum();
While this is a pretty simple example of how to avoid this common C# programming problem, there are cases where a single LINQ statement can easily replace dozens of statements in an iterative loop (or nested loops) in your code. And less code general means less opportunities for bugs to be introduced. Keep in mind, however, there may be a trade-off in terms of performance. In performance-critical scenarios, especially where your iterative code is able to make assumptions about your collection that LINQ cannot, be sure to do a performance comparison between the two methods.
Common Mistake #5: Failing to consider the underlying objects in a LINQ statement
LINQ is great for abstracting the task of manipulating collections, whether they are in-memory objects, database tables, or XML documents. In a perfect world, you wouldn’t need to know what the underlying objects are. But the error here is assuming we live in a perfect world. In fact, identical LINQ statements can return different results when executed on the exact same data, if that data happens to be in a different format.
For instance, consider the following statement:
decimal total = (from account in myAccounts
where account.Status == "active"
select account.Balance).Sum();
What happens if one of the object’s account.Status
equals
“Active” (note the capital A)? Well, if myAccounts
was
a DbSet
object
(that was set up with the default case-insensitive configuration), the where
expression
would still match that element. However, if myAccounts
was
in an in-memory array, it would not match, and would therefore yield a different result for total.
But wait a minute. When we talked about string comparison earlier, we saw that the ==
operator
performed an ordinal comparison of strings. So why in this case is the ==
operator
performing a case-insensitive comparison?
The answer is that when the underlying objects in a LINQ statement are references to SQL table data (as is the case with the Entity Framework DbSet object in this example), the statement is converted into a T-SQL statement. Operators then follow T-SQL rules, not C# rules, so the comparison in the above case ends up being case insensitive.
In general, even though LINQ is a helpful and consistent way to query collections of objects, in reality you still need to know whether or not your statement will be translated to something other than C# under the hood to ensure that the behavior of your code will be as expected at runtime.
Common Mistake #6: Getting confused or faked out by extension methods
As mentioned earlier, LINQ statements work on any object that implements IEnumerable. For example, the following simple function will add up the balances on any collection of accounts:
public decimal SumAccounts(IEnumerable<Account> myAccounts) {
return myAccounts.Sum(a => a.Balance);
}
In the above code, the type of the myAccounts parameter is declared as IEnumerable<Account>
.
Since myAccounts
references
a Sum
method
(C# uses the familiar “dot notation” to reference a method on a class or interface), we’d expect to see a method called Sum()
on
the definition of the IEnumerable<T>
interface.
However, the definition of IEnumerable<T>
,
makes no reference to any Sum
method
and simply looks like this:
public interface IEnumerable<out T> : IEnumerable {
IEnumerator<T> GetEnumerator();
}
So where is the Sum()
method
defined? C# is strongly typed, so if the reference to the Sum
method
was invalid, the C# compiler would certainly flag it as an error. We therefore know that it must exist, but where? Moreover, where are the definitions of all the other methods that LINQ provides for querying or aggregating these collections?
The answer is that Sum()
is
not a method defined on the IEnumerable
interface.
Rather, it is a static method (called an “extension method”) that is defined on the System.Linq.Enumerable
class:
namespace System.Linq {
public static class Enumerable {
...
// the reference here to “this IEnumerable<TSource> source” is
// the magic sauce that provides access to the extension method Sum
public static decimal Sum<TSource>(this IEnumerable<TSource> source,
Func<TSource, decimal> selector);
...
}
}
So what makes an extension method different from any other static method and what enables us to access it in other classes?
The distinguishing characteristic of an extension method is the this
modifier
on its first parameter. This is the “magic” that identifies it to the compiler as an extension method. The type of the parameter it modifies (in this case IEnumerable<TSource>
)
denotes the class or interface which will then appear to implement this method.
(As a side point, there’s nothing magical about the similarity between the name of the IEnumerable
interface
and the name of the Enumerable
class
on which the extension method is defined. This similarity is just an arbitrary stylistic choice.)
With this understanding, we can also see that the sumAccounts
function
we introduced above could instead have been implemented as follows:
public decimal SumAccounts(IEnumerable<Account> myAccounts) {
return Enumerable.Sum(myAccounts, a => a.Balance);
}
The fact that we could have implemented it this way instead raises the question of why have extension methods at all? Extension methods are essentially a convenience of the C# language that enables you to “add” methods to existing types without creating a new derived type, recompiling, or otherwise modifying the original type.
Extension methods are brought into scope by including a using
[namespace];
statement at the top of the file. You need to know which namespace includes the extension methods you’re looking for, but that’s pretty easy to determine once you know what it is you’re searching for.
When the C# compiler encounters a method call on an instance of an object, and doesn’t find that method defined on the referenced object class, it then looks at all extension methods that are within scope to try to find one which matches the required method signature and class. If it finds one, it will pass the instance reference as the first argument to that extension method, then the rest of the arguments, if any, will be passed as subsequent arguments to the extension method. (If the C# compiler doesn’t find any corresponding extension method within scope, it will throw an error.)
Extension methods are an example of “syntactic sugar” on the part of the C# compiler, which allows us to write code that is (usually) clearer and more maintainable. Clearer, that is, if you’re aware of their usage. Otherwise, it can be a bit confusing, especially at first.
While there certainly are advantages to using extension methods, they can cause problems and a cry for C# help for those developers who aren’t aware of them or don’t properly understand them. This is especially true when looking at code samples online, or at any other pre-written code. When such code produces compiler errors (because it invokes methods that clearly aren’t defined on the classes they’re invoked on), the tendency is to think the code applies to a different version of the library, or to a different library altogether. A lot of time can be spent searching for a new version, or phantom “missing library”, that doesn’t exist.
Even developers who are familiar with extension methods still get caught occasionally, when there is a method with the same name on the object, but its method signature differs in a subtle way from that of the extension method. A lot of time can be wasted looking for a typo or error that just isn’t there.
Use of extension methods in C# libraries is becoming increasingly prevalent. In addition to LINQ, the Unity Application Block and the Web API framework are examples of two heavily-used modern libraries by Microsoft which make use of extension methods as well, and there are many others. The more modern the framework, the more likely it is that it will incorporate extension methods.
Of course, you can write your own extension methods as well. Realize, however, that while extension methods appear to get invoked just like regular instance methods, this is really just an illusion. In particular, your extension methods can’t reference private or protected members of the class they’re extending and therefore cannot serve as a complete replacement for more traditional class inheritance.
Common Mistake #7: Using the wrong type of collection for the task at hand
C# provides a large variety of collection objects, with the following being only a partial list:
Array
, ArrayList
, BitArray
, BitVector32
, Dictionary<K,V>
, HashTable
, HybridDictionary
, List<T>
, NameValueCollection
, OrderedDictionary
, Queue,
Queue<T>
, SortedList
, Stack,
Stack<T>
, StringCollection
, StringDictionary
.
While there can be cases where too many