string txt = "abc efg ghi\t abc\n\n123\t efg 345\n";
var result = txt.Split(new char[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries)
.SelectMany((str, lineNumber) => str.Select((ch, i) => new { Ch = ch, Index = i })
.Where(item => Char.IsWhiteSpace(item.Ch) || item.Index==0)
.Select(item => new { Index = item.Index, CharList = str.Skip(item.Index==0 ? 0 : item.Index + 1).TakeWhile(ch => !Char.IsWhiteSpace(ch)).ToArray() })
.Select(item => new { Line=lineNumber, Offset = item.Index==0 ? 0 : item.Index+1, Word = new String(item.CharList) })
.Where(item => !String.IsNullOrWhiteSpace(item.Word)));
the idea is simple:
for each line, we do the following process:
- index each character
- find all index whose character is whitespace, this index indicates the start character of a word
- using this index to take the character till meet another whitespace
This approach might be the most efficient way to process this problem, but it is a good exercise to use LINQ.
No comments:
Post a Comment