A regex search mode

I know atuin already has a lot of search modes, and my proposal for an exact mode is still pending, but I’ve got one more: the regex! Hopefully it’s the last search mode I want.

I haven’t sent a pr yet. The code is here. It’s not polished yet but I hope someone can find it useful.

The reason for this mode is that, I often find I can’t locate the command I want. E.g. I want to find commands that use the ts tool but there are too many commands containing the substring ts. Also I wanted to search for things like %H but % is treated as a wildcard and I can’t fix it because of lack of support from sql-builder.

For the too many search modes problem, I suggestion to include an option to specify which are to be switched to (and in which order).

I’d really rather we be able to use regex or exact without adding yet another search mode

EG

searching for /a regex/ will search using regex, triggered by the /

But otherwise regex would definitely be cool to have

Overall - I regret adding search modes entirely. We should instead have had a search syntax that can support the sorts of queries people wish to do

1 Like

Good idea, but / appears in the command quite often. Having to escape a path is annoying.

I’m sure there is an alternative we could use, really / was just an example. I’m pretty strongly opposed to more search modes at this point, and would rather we tried other alternatives first. If there’s nothing else we can do then I think it makes sense, but I think this case has plenty of other options.

In an ideal world, this would be the only search mode we add from here

And fit all the required functionality into it

That sounds very nice! Do you already have a syntax spec, or let me come up with one?

It will likely resemble/be the SQLite fts5 spec: SQLite FTS5 Extension

So nothing unusual

Wrt the regex specifically, feel free to figure something out. More than happy to discuss though!

I do fairly like the idea of r/foo/ for regex, and I think it very very unlikely that would conflict with anything anyone actually wants to search. Open to other ideas.

I’ve come up with a syntax:

query         ::= single_query (space single_query)*
single_query  ::= neg_query | grouped_query | or_query | term
neg_query     ::= '!' single_query
grouped_query ::= '(' space? query space? ')'
or_query      ::= single_query space '|' space single_query

space         ::= ' '+
term          ::= regex_term | prefix_term | bare_term

regex_term    ::= '/' ANYTHING_BUT_LAZY '/' regex_flags?
regex_flags   ::= 'i'

prefix_term   ::= prefix ':' (regex_term | string_term)

string_term   ::= '^'? string '$'?
string        ::= quoted_string | bare_string
quoted_string ::= '"' escaped_char* '"'
bare_string   ::= NON_SPACE_CHAR+

(Alternatives are matched from left to right, and first match wins.)

I hope I write it right.

There are some remaining issues:

  • There can be syntax errors in the query string. We need a way to show an error message.
  • Where should the fuzzy match fit? I’m thinking of an option to turn bare_string as a fuzzy match or a substring match.
  • Being able to match word boundaries will be very convenient but I’m not sure how to indicate it.

I think that looks pretty reasonable! Thank you!

I don’t think we need to support quoted_string for column queries (cwd:/foo/bar), as I can’t see any cases where we’d need to specify a value that may contain whitespace. Can always add it in the future, but it would obviously complicate parsing.

I think we should do absolutely anything but show an error message. If you went to google something and it threw an EXPECTED_RIGHT_QUOTE at you it wouldn’t be a very nice experience

I don’t think it should personally. I don’t think shoe-horning fzf syntax in here will work very well, and if users strongly desire it they can stick to the fuzzy mode.

Otherwise; the regex matching syntax is definitely something that could fit into any search mode we have now, so I don’t think #1455 is a blocker for it.

Directory names can contain spaces (e.g. Program Files or Telegram Desktop), although not a lot. I do want to be able to match spaces in the command string, so that’s a reuse.

but not showing anything is worse, e.g. for a bad regex or date, if the user doesn’t realize there is a syntax error, they will wonder why something is there doesn’t show up. Incomplete input will be treated as ending at the end though.

I want to use a parser to do the parsing stuff, no more string manipulation :slight_smile: .

I think it would make sense for a regex, but not really anywhere else right now

I don’t think this is necessary, and is just extra complexity we don’t need. Most of the search syntax can be handled by an underlying search index (SQLite fts, tantivy, whatever). Kinda eliminates the requirement for us to implement it ourselves. I’d rather avoid an overly verbose parser combinator or generation step.

Realistically, for the regex you are proposing, a string match would be very simple.

Anyway, to summarize

I’m not against a parser for enhanced search syntax, but I’d like to ensure that it’s absolutely necessary and outline what we wish to achieve by using one vs using existing search technology (which already parses).

Otherwise, wrt regex (which would be good to have), I suggested this previously:

Which yes, is a “string match”, but a very trivial one at that. We can then support regex across the board, pretty easily, and without thinking about search indices or parsing queries.

Writing a parser (or using a generated one) means we’ll need to start thinking about tokenization and word boundaries, which again, is better handled by an actual search index.

Actually I find my proposal very limiting: it doesn’t support multi-term queries or negation.

The r/.../ syntax will solve this, but it is not very simple to match as long as you want spaces in it. And I do want spaces in my search terms.

I almost hate every open source fulltext search indexer in existence due to CJK issues.

On the other side, I don’t want a fulltext search index for my command history—I want match exactly. Commands are not a natural language. (And they don’t support regex anyway.)

I did that in my project in the past, and tried to add my feature on top of it…and in the end I implemented a parser that partially parses its search syntax.

Btw in #1455 you mentioned scoring. I also think it’s a bad idea for atuin. I need the result to be order by time to determine which similar-looking commands is the one I want. (There is an issue that imported history is sorted in reverse. I didn’t try to fix it because my history has already been imported. When the items are old and has 0s duration, I assume they are in reverse order.)

I’m not sure what you mean. It’s still pretty easy to match on if you wish to have spaces in between / and /? We can literally just check the prefix and suffix of a query.

Anyway. I’d rather not spend ages discussing the benefits of various parsers and search indices, I’d be open to adding a regex search mode if that makes life easier here.

I’m afraid implementing a search index that better supports CJK is very, very out of scope here. Happy to take suggestions if you know of anything better.

Fulltext search indices are great for many things that are not natural language. In fact, Tantivy (one I have proposed) is mostly used and developed for searching logs and traces. Which are also not natural language.

It may not suit you, but a lot of users have asked for smarter sorting based on things that are not just time.

EG

I think we can keep the fulltext search mode you like, but what I’m proposing in 1455 is unlikely to be something you’d be happy with.

My plan is to make this search scoring customizable, as people tend to have very strong feelings here.

Would something like this sort cjk issues for you?

OK, but I’m not going to invent in fts5 or tantivy, so I won’t help to implement #1455.

It’s enough for atuin, but not good enough for general use (for Chinese it’s basically unigram-based, but I want bigrams to keep results relevant).

I’ll try to implement this.

1 Like