Using wget, sed and gawk to get a description of the DBA_ views

I’m studying for Oracle certification, and I thought it would be handy to get a cribsheet list of all of the DBA_ views and what they are for.

Rather than create this manually, I did the following.

Downloaded the top level contents

wget -r -l1 -k http://download.oracle.com/docs/cd/B19306_01/server.102/b14237/toc.htm

wget gets pages from the web. ‘-r’ says to also retrieve links. ‘-l1’ says to only get one level of links (i.e. get the pages linked to from the first page, but don’t get the pages linked to by them). ‘-k’ says to edit the internal links so you can go from one page to another on the downloaded copy.

Next I run the following from the dos prompt:

findstr /S DBA_ *.htm |^
findstr "

"|^
sed -e "s//~/" -e "s//~/" |^
gawk -F~ "!/same as those/{print $2 "~" $3,$4,$5,$6,$7,$8,$9}" |^
sed -e "s/]*>//g" |^
sort -u


I’ll go through that line by line.

Find occurences of the string DBA_ in any htm files. The /S on findstr makes it search subdirectories too

findstr /S DBA_ *.htm |^

Look for the html paragraph tag. This just cut out some of the occurences of DBA_ where it isn’t the descriptive line I want

findstr "

"|^

This sed command replaces both the code begin and end tags with a ‘~’ (tilda). There are probably ways of avoiding this step by making the following awk do more work, but I don’t how!. The ‘sed’ executable is from the unxutils package.

sed -e "s//~/" -e "s//~/" |^

‘Same as those’ is just another filter. The second field is the name of the DBA_ view.

gawk -F~ "!/same as those/{print $2 "~" $3,$4,$5,$6,$7,$8,$9}" |^

This bit of sed removes html tags from the text. To be honest I’m not entirely sure how it works but I can see that it does. When I tried to sed it myself I found that if there were two tags on a line it would remove both the tages and anything between them. I found the code here: remove html tags from a file | UNIX | Tech-Recipes. One day I’ll get out my sed book and work it out !

sed -e "s/]*>//g" |^

This is the Unix sort, also from unxutils.

sort -u

The output from all of this is like this:

DBA_TABLES~ describes all relational tables in the database.
DBA_TABLESPACE_GROUPS~ describes all tablespace groups in the database.
DBA_TABLESPACES~ describes all tablespaces in the database.
DBA_TEMP_FILES~ describes all temporary files (tempfiles) in the database.

It needed a little bit of cleaning up. I would post the whole list, but I would assume Oracle wouldn’t be pleased. I am assuming that the wget is OK – in principle it’s little different to browsing.

You can adapt the technique for the initialization parameters etc

Advertisements