Sunday, 20 August 2017

How to find all hyperlinks present in a webpage by Unix script



For e.g. If you want to get all the group id present in maven central repository.

https://repo1.maven.org/maven2/log4j/


#!/usr/bin/bash
URL="https://repo1.maven.org/maven2/log4j/"

wget -O - $URL | \
  grep -o '<a href=['"'"'"][^"'"'"']*['"'"'"]' | \
  sed -e 's/^<a href=["'"'"']//' -e 's/["'"'"']$//' >> hyperlink.lst


It will print all the hyperlink present in the webpage .

Output:

https://repo1.maven.org/maven2/log4j/apache-log4j-extras/
https://repo1.maven.org/maven2/log4j/log4j/

No comments:

Post a Comment

Thank You for your valuable comment

Difference between class level and object locking and static object lock

1) Class level locking will lock entire class, so no other thread can access any of other synchronized blocks. 2) Object locking will lo...