小牛电子书 > 其他电子书 > VB2008从入门到精通(PDF格式英文版) >

第94章

VB2008从入门到精通(PDF格式英文版)-第94章

小说: VB2008从入门到精通(PDF格式英文版) 字数: 每页3500字

按键盘上方向键 ← 或 → 可快速上下翻页,按键盘上的 Enter 键可回到本书目录页,按键盘上方向键 ↑ 可回到本页顶部!
————未阅读完?加入书签已便下次继续阅读!




text file is loaded using another text editor; such as Vim; the text is displayed as shown in  

Figure 10…4。 As you can see; Vim has loaded the text file without any formatting errors。 



■Note  Vim is available from  http://vim。org。 It is a vi…derived clone that can be used on  

Windows systems。 



Figure 10…4。  Vim loads the text file in a nicely formatted display。 



      The real pressing problem lies in the structure of the data; which is illustrated in Figure 10…5。  

Here; the data has new formatting; with extra columns; and the first column is not always in  

the proper data format。 And to make matters worse; the badly formatted data has repeating  

information。 

      The challenge of the application is to read the stream and fix all of the problems。 This requires  

a thorough understanding of string processing and the different ways that text can be stored;  

as discussed in Chapter 3。 When you are processing data streams; you need to be aware of the  

format of the data stream。 In this example; we are processing ASCII text; and thus will be manipu

lating bits according to the rules of the ASCII lookup table。 

      Whitespace characters are special characters in the text lookup table。 They are associated  

with numbers; but their representation is in the form of an action that the user can see。 For  

example; the character between single quotation marks (' ') is a space; the character t is a tab;  

and the character n is a newline。 The reason Notepad does not format the lottery text file nicely  

(Figure 10…3) is because of the whitespace characters used to indicate a newline。 In Figure 10…6; the  

highlighted buffer entry 0A is the hexadecimal character that indicates a linefeed; or newline; in  

the lottery text file。 


…………………………………………………………Page 284……………………………………………………………

 262        CH AP T E R   1 0   ■    L E A R N I N G   A B OU T   P E R S IS TE N CE 



            Figure 10…5。 Structural problems of this data stream 



            Figure 10…6。 Newline character used in lotto。txt 



                  Figure 10…7 is a file created by Notepad。 Notepad expects not a single whitespace character;  

            but two whitespace characters to indicate a newline: 0D and 0A。 


…………………………………………………………Page 285……………………………………………………………

                                                         CH A PT E R   1 0   ■    L E A R N I N G   A B O U T  P E R S IS T E N CE 263 



Figure 10…7。 Newline characters used by Notepad 



Deciphering the Format 



The echo has served its purpose of providing a way to develop an application in a top…down  

manner。 The next step is to remove the echo code and start writing the code that will fix the  

data stream。 

     Fixing the data stream is not a trivial undertaking; because you are yet again faced with a  

state problem。 You don’t want to fix one part of the stream; only to end up with a problem in  

another part of the stream。 Thus; you need to incrementally fix the stream and make sure at  

each step that there are no ramifications。  

     The first step is to break the data stream into individual fields (each value in a column is a  

field in this case)。 In Figure 10…5; the data stream had two parts; where the upper part seemed  

to have a single space between the numbers and the lower part had the amount of space neces

sary to align the numbers。 The difference between the upper and lower parts is the whitespace  

characters used。 So; the first step will be to clean up the whitespace。 

     The following is the code that reads the buffer; splits it up; and reassembles the content  

into a new buffer。 The code is intermediate code that adds special bracket markers to indicate  

what the text contains。 



Imports System。IO 

Imports System。Text 



 ' TODO: Fix up this class 

Public Class LottoTicketProcessor : Implements IProcessor 

    Public Function Process(ByVal input As String) As String 

    Implements IProcessor。Process 

        Dim reader As TextReader = New StringReader(input) 

        Dim retval As New StringBuilder() 


…………………………………………………………Page 286……………………………………………………………

264       CH AP T E R   1 0   ■    L E A R N I N G   A B OU T   P E R S IS TE N CE 



                   Do While reader。Peek()  …1 

                       Dim splitUpText As String() = _ 

                         reader。ReadLine。Split(New Char() {〃 〃c; ControlChars。Tab}) 

                       Dim c1 As Integer 

                       For c1 = 0 To splitUpText。Length 1 

                           retval。Append((〃(〃 & splitUpText(c1) & 〃)〃)) 

                       Next 

                       retval。Append(ControlChars。NewLine) 

                   Loop 

                   Return retval。ToString() 

               End Function 

           End Class 



                In the implementation of Process(); the text will be parsed line by line。 Then each line  

           is split into the individual fields。 You could write the parsing routines yourself; but to parse a  

           buffer line by line; it is more efficient to use StringReader。 StringReader accepts the string to  

           parse and is then assigned to a TextReader interface instance。 

                As each line of text is parsed; the most efficient approach to building a buffer is to use  

           StringBuilder。 You could keep appending data to the string; but if you do that too often the  

           application’s performance will suffer。  

                The String type is an immutable type; which means once an object is initialized; you  

           cannot change the state of the object。 The advantage of immutable types is that they increase  

           the speed of your application; because code can assume once an object has been assigned; it  

           will never change。 The downside is that once an object is assigned; to modify the object state  

           even slightly; you must instantiate a new object; which would be the case if we used the = and  

           ± operators。 The  StringBuilder type is like String; except the referenced text can be modified。 

                In the Process() implementation; the Do While loop calls the method Peek(); which reads;  

           but does not remove; a character value from the stream。 If there is nothing more to read; a …1  

           value is returned。 Otherwise; data is available; and the method  ReadLine() can be called。  

           ReadLine() will read a buffer of characters until a newline or return character is encountered。  

           Having read a line of text; it is split into the individual fields using the Split() method。 The split  

           characters are the space and tab character (ControlChars。Tab)。 

                When the Split() method returns; the individual fields are assigned to the array splitUpText。  

           Those array elements are iterated and appended to the StringBuilder variable retval; but each  

           element is surrounded by a set of brackets。 The brackets provide a set of boundaries that you  

           can inspect to see what data has been found。 I include the brackets purely for debugging purposes。  

           Because I am trying to reformat the stream; I append a newline character (ControlChars。NewLine) to  

           the variable retval。 

                When all of the lines of text and fields within the lines of text are iterated; a string represen

           tation of the StringBuilder instance is returned using the ToString() method。 Running the  

           code shows how many fields each line of text has and how you should format the text file。 This  

           gives you an understanding of how the file is structured。  

                The following is sample output from the lotto。txt file。 


…………………………………………………………Page 287……………………………………………………………

                                                        CH A PT E R   1 0   ■    L E A R N I N G   A B O U T  P E R S IS T E N CE 265 



(2000。01。1

返回目录 上一页 下一页 回到顶部 2 2

你可能喜欢的