How to parse HTML or get page title (from Fetch API))

linroex · March 14, 2021, 8:14am

Hi

I will save link into omnifocus and read it later, I want to write a automation plugin help me convert link as page title as set as task.name, for example:

https://facebook.com -> Facebook

I use Fetch API to get raw html, then I need parse HTML to get title, now I use regex to get , but I think this is stupid, Is any way can do same thing but good than regex?

MartinPacker · March 14, 2021, 8:52am

If it is just the title then a RegEx is probably the most practical way.

If you want more than that then you need to be doing what’s called “tree walking”. Basically the web page gets turned into a tree structure and you navigate the tree to extract what you want.

But - just for the title - a RegEx looking for <title> and </title> and extracting what’s between them is cheap, easy, and quick.

(And, yes, I’ve done precisely this - but not with OmniAutomation.)