codri
1
Hi,
I'm writing a Lexer and I want it to do it with minimum allocations.
The TokenStream wraps over a Peekable Chars iterator and itself implements an iterator trait. i.e. it consumes a char iterator, while exposing a TokenIterator.
The comments are inline, basically I'm looking for a way to get a string slice from a TakeWhile Iterator, as I've seen that it's possible todo so for the Chars one.
I think I might be missing some feature in Rust that would allow me to do that, ie use a method on the wrapped iterator.
Thanks
struct TokenStream<'a>
{
it: std::iter::Peekable<std::str::Chars<'a>>
}
impl<'a> Iterator for TokenStream<'a> {
type Item = Token<'a>;
fn next(&mut self) -> Option<Token<'a>> {
match self.it.peek() {
Some(&ch) => match ch {
'0' ... '9' => {
// Error, cannot convert the TakeWhile iterator to as_str
// The Chars iterator has this method and it should return a string slice
// I'm looking how to make this with the iterators
// Do I have to implement the as_str for the TakeWhile Iterator?
// Or is there a way to access the as_str method of the Chars iterator
Some(Token::Number(self.it.take_while(|a| a.is_numeric()).as_str()))
},
'+' => {
self.it.next().unwrap();
Some(Token::Operator(Symbol::Plus))
},
_ => Some(Token::End)
},
None => None
}
}
}
The compiler Error:
error: no method named `as_str` found for type `std::iter::TakeWhile<std::iter::Peekable<std::str::Chars<'a>>, [closure@src/main.rs:67:30: 67:48]>` in the current scope
--> src/main.rs:67:50
|
67 | self.it.take_while(|a| a.is_numeric()).as_str()))
| ^^^^^^
error: aborting due to previous error
1 Like
Here's how I'd implement this with minimal changes to your code:
struct TokenStream<'a> {
it: std::str::Chars<'a>,
}
#[derive(Debug)]
enum Token<'a> {
Number(&'a str),
Plus,
End,
}
impl<'a> Iterator for TokenStream<'a> {
type Item = Token<'a>;
fn next(&mut self) -> Option<Token<'a>> {
match self.it.clone().next() {
Some(ch) => {
match ch {
'0'...'9' => {
let str = self.it.as_str();
while self.it.clone().next().map_or(false, |ch| ch.is_numeric()) {
self.it.next();
}
Some(Token::Number(&str[..str.len() - self.it.as_str().len()]))
}
'+' => {
self.it.next();
Some(Token::Plus)
}
_ => {
self.it.next();
Some(Token::End)
}
}
}
None => None,
}
}
}
fn main() {
let mut ts = TokenStream { it: "123+456+789+z".chars() };
println!("{:?}", ts.collect::<Vec<_>>());
}
Output:
[Number("123"), Plus, Number("456"), Plus, Number("789"), Plus, End]
Instead of std::iter::Peekable<std::str::Chars>
, I'm using std::str::Chars
directly and cloning it when I need to "peek". Cloning std::str::Chars
is pretty cheap -- it's just 2 pointers. In some lexers I wrote a long time ago I found cloning like this to be faster than using Peekable
, at least for the Chars
and CharIndices
iterators.
You should probably extract the logic to extract the substring in the Number case to a separate function as you'll most likely need to use it for other tokens as well.
3 Likes
codri
3
Thank you,
That's exactly what I was looking for, I went temporarily with storing the indexes in the token as opposed to string slices, but this seems to be much better.