Regex for matching HashTags in any language

Tags: , , ,



I have a field in my application where users can enter a hashtag. I want to validate their entry and make sure they enter what would be a proper HashTag. It can be in any language and it should NOT precede with the # sign. I am writing in JavaScript.

So the following are GOOD examples:

  • Abcde45454_fgfgfg (good because: only letters, numbers and _)
  • 2014_is-the-year (good because: only letters, numbers, _ and -)
  • בר_רפ×לי (good because: only letters and _)
  • арбуз (good because: only letters)

And the following are BAD examples:

  • Dan Brown (Bad because has a space)
  • OMG!!!!! (Bad because has !)
  • בר רפ@לי (Bad because has @ and a space)

We had a regex that matched only a-zA-Z0-9, we needed to add language support so we changed it to ignore white spaces and forgot to ignore special characters, so here I am.

Some other StackOverflow examples I saw but didn’t work for me:

  1. Other languges don’t work
  2. Again, English only

[edit]

  • Added explanation why bad is bad and good is good
  • I don’t want a preceding # character, but if I would to add a # in the beginning, it should be a valid hashtag
    • Basically I don’t want to allow any special characters like !@#$%^&*()=+./,[{]};:'”?><

Answer

If your disallowed characters list is thorough (!@#$%^&*()=+./,[{]};:'"?><), then the regex is:

^#?[^s!@#$%^&*()=+./,[{]};:'"?><]+$

Demo

This allows an optional leading # sign: #?. It disallows the special characters using a negative character class. I just added s to the list (spaces), and also I escaped [ and ].

Unfortunately, you can’t use constructs like p{P} (Unicode punctuation) in JavaScript’s regexes, so you basically have to blacklist characters or take a different approach if the regex solution isn’t good enough for your needs.



Source: stackoverflow