![]() |
< Day Day Up > |
![]() |
5.4. Searching, Modifying, and Encoding a String's ContentThis section describes string methods that are used to perform diverse but familiar tasks such as locating a substring within a string, changing the case of a string, replacing or removing text, splitting a string into delimited substrings, and trimming leading and trailing spaces. Searching the Contents of a StringA string is an implicit zero-based array of chars that can be searched using the array syntax string[n], where n is a character position within the string. For locating a substring of one or more characters in a string, the string class offers the IndexOf and IndexOfAny methods. Table 5-2 summarizes these.
Searching a String That Contains SurrogatesAll of these techniques assume that a string consists of a sequence of 16-bit characters. Suppose, however, that your application must work with a Far Eastern character set of 32-bit characters. These are represented in storage as a surrogate pair consisting of a high and low 16-bit value. Clearly, this presents a problem for an expression such as poem[ndx], which would return only half of a surrogate pair. For applications that must work with surrogates, .NET provides the StringInfo class that treats all characters as text elements and can automatically detect whether a character is 16 bits or a surrogate. Its most important member is the GetTextElementEnumerator method, which returns an enumerator that can be used to iterate through text elements in a string.
TextElementEnumerator tEnum =
StringInfo.GetTextElementEnumerator(poem) ;
while (tEnum.MoveNext()) // Step through the string
{
Console.WriteLine(tEnum.Current); // Print current char
}
Recall from the discussion of enumerators in Chapter 4, "Working with Objects in C#," that MoveNext() and Current are members implemented by all enumerators. String TransformationsTable 5-3 summarizes the most important string class methods for modifying a string. Because the original string is immutable, any string constructed by these methods is actually a new string with its own allocated memory. Most of these methods have analogues in other languages and behave as you would expect. Somewhat surprisingly, as we see in the next section, most of these methods are not available in the StringBuilder class. Only Replace, Remove, and Insert are included. String EncodingEncoding comes into play when you need to convert between strings and bytes for operations such as writing a string to a file or streaming it across a network. Character encoding and decoding offer two major benefits: efficiency and interoperability. Most strings read in English consist of characters that can be represented by 8 bits. Encoding can be used to strip an extra byte (from the 16-bit Unicode memory representation) for transmission and storage. The flexibility of encoding is also important in allowing an application to interoperate with legacy data or third-party data encoded in different formats. The .NET Framework supports many forms of character encoding and decoding. The most frequently used include the following:
Encoding and decoding are performed using the Encoding class found in the System.Text namespace. This abstract class has several static properties that return an object used to implement a specific encoding technique. These properties include ASCII, UTF8, and Unicode. The latter is used for UTF-16 encoding. An encoding object offers several methods梕ach having several overloads梖or converting between characters and bytes. Here is an example that illustrates two of the most useful methods: GetBytes, which converts a text string to bytes, and GetString, which reverses the process and converts a byte array to a string. string text= "In Xanadu did Kubla Khan"; Encoding UTF8Encoder = Encoding.UTF8; byte[] textChars = UTF8Encoder.GetBytes(text); Console.WriteLine(textChars.Length); // 24 // Store using UTF-16 textChars = Encoding.Unicode.GetBytes(text); Console.WriteLine(textChars.Length); // 48 // Treat characters as two bytes string decodedText = Encoding.Unicode.GetString(textChars); Console.WriteLine(decodedText); // "In Xanadu did ... " You can also instantiate the encoding objects directly. In this example, the UTF-8 object could be created with UTF8Encoding UTF8Encoder = new UTF8Encoding(); With the exception of ASCIIEncoding, the constructor for these classes defines parameters that allow more control over the encoding process. For example, you can specify whether an exception is thrown when invalid encoding is detected. |
![]() |
< Day Day Up > |
![]() |