Category Archives: Python

Recursive grep-like search for extracting URLs from a bunch of files

import os
import re
import sys

# Crazy URL regexp from Gruber
# http://daringfireball.net/2010/07/improved_regex_for_matching_urls
r = re.compile(r'(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?]))')

# grep -r
for parent, dnames, fnames in os.walk(sys.argv[1]):
    for fname in fnames:
        filename = os.path.join(parent, fname)
        if os.path.isfile(filename):
            with open(filename) as f:
                c = 0
                for line in f:
                    c = c + 1
                    match = r.search(line)
                    if match:
                        # <file>:<line>:<match>
                        print '%s:%s:%s' % (filename, c, match.string[match.start():match.end()])
                        # <match>
                        #print match.string[match.start():match.end()]

Source: Recursive grep-like search for extracting URLs from a bunch of files

File Handling in Python

The access modes available for the open() function are as follows:

  • r: Opens the file in read-only mode. Starts reading from the beginning of the file and is the default mode for the open() function.
  • rb: Opens the file as read-only in binary format and starts reading from the beginning of the file. While binary format can be used for different purposes, it is usually used when dealing with things like images, videos, etc.
  • r+: Opens a file for reading and writing, placing the pointer at the beginning of the file.
  • w: Opens in write-only mode. The pointer is placed at the beginning of the file and this will overwrite any existing file with the same name. It will create a new file if one with the same name doesn’t exist.
  • wb: Opens a write-only file in binary mode.
  • w+: Opens a file for writing and reading.
  • wb+: Opens a file for writing and reading in binary mode.
  • a: Opens a file for appending new information to it. The pointer is placed at the end of the file. A new file is created if one with the same name doesn’t exist.
  • ab: Opens a file for appending in binary mode.
  • a+: Opens a file for both appending and reading.

Source: File Handling in Python

  • ab+: Opens a file for both appending and reading in binary mode.