Extract URL in JavaScript, Python, and Bash: Quick Recipes

Quick recipes to extract URLs — JavaScript, Python, Bash

JavaScript (browser or Node)

Use a robust regex to find http/https URLs:

javascript
const text = “Visit https://example.com/page?x=1 and http://sub.example.org.”;
const urls = […text.matchAll(/https?:\/\/[^\s”‘<>]+/gi)].map(m => m[0]);
console.log(urls);

To extract hrefs from HTML in browser:

javascript
const anchors = Array.from(document.querySelectorAll(‘a[href]’));
const hrefs = anchors.map(a => a.href);

Python

Simple regex to extract full URLs:

python
import re text = “See https://example.com and http://sub.example.org/page”
pattern = r’https?://[^\s”\‘<>]+’urls = re.findall(pattern, text)
print(urls)

Use urllib/BeautifulSoup for HTML-safe extraction:

python
from bs4 import BeautifulSoup html = ‘linkext’
soup = BeautifulSoup(html, “html.parser”)
urls = [a.get(‘href’) for a in soup.findall(‘a’, href=True)]
print(urls)

Bash

Using grep (PCRE) to extract http/https links from a file or stdin:

bash
# GNU grep with -P (PCRE) and -o to print only matches grep -oP ‘https?://[^\s”’‘’<>]+’ file.txt

Using awk (portable-ish):

bash
awk ’{ while (match(\(0, /https?:\/\/[^ \t"'\''<>]+/)) { print substr(\)0, RSTART, RLENGTH) \(0 = substr(\)0, RSTART+RLENGTH) } }’ file.txt

Notes (concise)

Regexes above work for common cases but can miss edge cases (nested parentheses, punctuation).

For HTML, prefer HTML parsers (DOM in JS, BeautifulSoup in Python) over regex.

Extract URL in JavaScript, Python, and Bash: Quick Recipes

Quick recipes to extract URLs — JavaScript, Python, Bash

JavaScript (browser or Node)

Python

Bash

Notes (concise)

Comments

Leave a Reply Cancel reply

More posts

Beginner’s Guide to SWF Sound Automation Tool: Features & Tips

Speed Test Internet: How to Measure Your True Download & Upload Speeds

Gmod Lua Lexer: A Beginner’s Guide to Tokenizing Garry’s Mod Scripts

10 DLLBased Best Practices for Stable Applications