6. Metaweb Write Services
This chapter explains, and demonstrates with examples, how to deliver MQL write queries to the Metaweb mqlwrite service. As a necessary prerequisite, it shows how to log in to Metaweb to obtain authentication credentials, and how to present those credentials in subsequent writes. The chapter also demonstrates how to upload content, such as images and HTML documents to the Metaweb content store.
The examples in this chapter are all written in Python. They all communicate with the Metaweb services at sandbox.freebase.com. And they are all designed to be run as command-line utilities. Note that this differs from Chapter 4 which placed more emphasis on server-side code for web applications. Many Metaweb-enabled web applications will be read-only, but the Python code shown in this chapter should be straightforward to port for use in server-side scripts if you need it.
No registration or authentication is required to use the mqlread service, but writing to Metaweb requires a log in first. Before we can cover the mqlwrite service, therefore, we must explain the login service.
Since Metaweb services are HTTP-based, Metaweb authentication is cookie based. You log in by making an HTTP POST request to the URL http://www.freebase.com/api/account/login, passing your username and password as URL-encoded form parameters. If login is successful, Metaweb returns one or more HTTP cookies in the response headers. These cookies contain your authentication credentials, and you must pass these back to Metaweb in the HTTP request headers of all subsequent write and upload requests.
The names and values of the authentication cookies are an implementation detail rather than a specification detail, and are subject to change. To ensure success, your code must accept all cookies returned by the login service, and must present all of them to the mqlwrite service. If you write your applications using a suitably high-level HTTP library, cookie handling may be performed automatically for you. For this chapter, however, we explicitly handle the cookies at a lower level.
The cookies returned by the login service are persistent, which means that you do not have to log into the freebase.com client each time you visit the site. Nevertheless, when writing scripts that use Metaweb services, the best practice is to assume that your cookies have expired and log in each time the script is invoked.
Example 6.1 shows Python code that defines a metaweb.login() utility function. This function sends a username and password to the login service, and returns cookies that can be treated as opaque authentication credentials. If login fails, the function raises a metaweb.MQLError exception. This login utility function is part of a larger metaweb.py module. Other examples in this chapter are also part of the module. We first saw metaweb.py in Chapter 4, where Example 4.3 defined a metaweb.read() utility function. Example 4.3 also includes the definition of the MQLError exception class used in this example. Example 6.1 (and all our other Python examples) also depends on the simplejson module, which is available from http://cheeseshop.python.org/pypi/simplejson.
Example 6.1. metaweb.py: logging in to Metaweb with Python
(The full metaweb.py file is available as Appendix B.)
import httplib # Low-level HTTP networking
import urllib # URL encoding
import simplejson # JSON serialization and parsing
host = 'sandbox.freebase.com' # The Metaweb host
loginservice = '/api/account/login' # Path to login service
# Submit the specified username and password to the Metaweb login service.
# Return opaque authentication credentials on success.
# Raise MQLError on failure.
def login(username, password):
# Establish a connection to the server and make a request.
# Note that we use the low-level httplib library instead of urllib2.
# This allows us to manage cookies explicitly.
conn = httplib.HTTPConnection(host)
conn.request('POST', # POST the request
loginservice, # The URL path /api/account/login
# The body of the request: encoded username/password
urllib.urlencode({'username':username, 'password':password}),
# This header specifies how the body of the post is encoded.
{'Content-type': 'application/x-www-form-urlencoded'})
# Get the response from the server
response = conn.getresponse()
if response.status == 200: # We get HTTP 200 OK even if login fails
# Parse response body and raise a MQLError if login failed
body = simplejson.loads(response.read())
if not body['code'].startswith('/api/status/ok'):
error = body['messages'][0]
raise MQLError('%s: %s' % (error['code'], error['message']))
# Otherwise return cookies to serve as authentication credentials.
# The set-cookie header holds one or more cookie specifications,
# separated by commas. Each specification is a name, an equal
# sign, a value, and one or more trailing clauses that consist
# of a semicolon and some metadata. We don't care about the
# metadata. We just want to return a comma-separated list of
# name=value pairs.
cookies = response.getheader('set-cookie').split(',')
return ';'.join([c[0:c.index(';')] for c in cookies])
else: # This should never happen
raise MQLError('HTTP Error: %d %s' % (response.status,response.reason))
As you can see from Example 6.1, the login service expects URL-encoded username and password parameters, in the body of an HTTP POST request to the path /api/account/login. The login service always returns an HTTP status code of "200 OK", even when login fails. The body of the response is a JSON formatted object, and the code property of this object specifies whether the login succeeded or failed. If the code property begins with "/api/status/ok", then login succeeded, and the response will include a "set-cookie" header that contains authentication credentials.
If the login fails, then the code property of the JSON object will be something other then "/api/status/ok". In this case, the messages property is an array (typically with just one element) of message objects. The message property of the first element of this array includes an error message that describes what went wrong. (Typically, this is "Invalid username or password".)
Chapter 5 demonstrated, in great detail, how to express write queries in MQL. Now we explain how to send those queries to Metaweb and retrieve the result. The Metaweb write service is mqlwrite. The path to this service is /api/service/mqlwrite, and it responds to HTTP POST requests only. Write queries are wrapped in envelopes as previously described for the mqlread service in section 4.2.3. One or more write queries are wrapped in envelope objects and submitted in the body of the POST request as the value of the queries parameter.
Authentication credentials are included in the HTTP Cookie header of the request. The mqlwrite service performs the query, wraps the result in a response envelope object, and returns the JSON serialization of that object as the body of its HTTP response.
The query and response envelopes are described in the sub-sections that follow, and those explanations are followed by example code that performs writes.
The mqlwrite service has outer and inner query and response envelopes just like the mqlread service. The structure of these envelopes is described in detail in section 4.2.3. More than one mqlwrite can be requested in one HTTP transaction, although in practice most mqlwrite requests contain a single query.
6.2.2. A mqlwrite Utility Function
Example 6.2 is another piece of our metaweb.py module. It defines a metaweb.write() function that submits a MQL write query to the mqlwrite service, and returns the result. The query that is submitted and the result that is returned are both Python dictionary objects. The utility function takes case of JSON serialization and parsing and also seals the query in a query envelope and opens the response envelope to extract the query result. Note that metaweb.write() expects authentication credentials as its second argument. Use it with the metaweb.login() function in code like this:
import metaweb # Use the metaweb module
query = {'create':'unless_exists', 'id':None, # This is our query
'type':'/common/topic', 'name':'my test object'}
credentials = metaweb.login("name", "pass") # Login first
result = metaweb.write(query, credentials) # Execute query
print "%s: %s" % (result['create'], result['id']) # Print result details
The code in Example 6.2 uses the metaweb.MQLError exception class. That class was defined in Example 4.3.
Example 6.2. metaweb.py: sending a query to mqlwrite
(The full metaweb.py file is available as Appendix B.)
import urllib # URL encoding import urllib2 # Higher-level URL content fetching import simplejson # JSON serialization and parsing host = 'sandbox.freebase.com' # The Metaweb host writeservice = '/api/service/mqlwrite' # Path to mqlwrite service
# Submit the MQL write q and return the result as a Python object.
# Authentication credentials are required, obtained from login()
# Raises MQLError if the query was invalid. Raises urllib2.HTTPError if
# mqlwrite returns an HTTP status code other than 200
def write(query, credentials):
# We're requesting this URL
req = urllib2.Request('http://%s%s' % (host, writeservice))
# Send our authentication credentials as a cookie
req.add_header('Cookie', credentials)
# This custom header is required and guards against XSS attacks
req.add_header('X-Metaweb-Request', 'True')
# The body of the POST request is encoded URL parameters
req.add_header('Content-type', 'application/x-www-form-urlencoded')
# Wrap the query object in a query envelope
envelope = {'qname': {'query': query}}
# JSON encode the envelope
encoded = simplejson.dumps(envelope)
print encoded
# Use the encoded envelope as the value of the q parameter in the body
# of the request. Specifying a body automatically makes this a POST.
req.add_data(urllib.urlencode({'queries':encoded}))
# Now do the POST
f = urllib2.urlopen(req)
response = simplejson.load(f) # Parse HTTP response as JSON
print response
inner = response['qname'] # Open outer envelope; get inner envelope
# If anything was wrong with the invocation, mqlwrite will return an HTTP
# error, and the code above with raise urllib2.HTTPError.
# If anything was wrong with the query, we will get an error status code
# in the response envelope.
# we raise our own MQLError exception.
if not inner['code'].startswith('/api/status/ok'):
error = inner['messages'][0]
raise MQLError('%s: %s' % (error['code'], error['message']))
# If there was no error, then just return the result from the envelope
return inner['result']
Example 6.1 and Example 6.2 are helpful utility functions, but they are just utilities, not real-world examples of how you might write to Metaweb. For the sake of example, let's suppose that you are a coin collector and you think that freebase.com should include information about each of the coins issued by the US Mint under its 50 State Quarters program. First, visit the US Mint's website [17] to find out when each state quarter was released, how many were minted for each state, and when each state became a state.
Next, create a Metaweb type to model this data. Login to the freebase.com client and create a new type named "US State Quarter". (Review the procedure for creating types and adding properties in Chapter 5, if necessary.) If your Freebase username is "fred", this will create a type with id /user/fred/default_domain/us_state_quarter.
Next, give your type four properties:
-
A property named "State", of
/location/us_stateto specify the name of the state. (The freebase.com client specifies types by name rather than by id, so use the type name "US State" in the client). -
A property named "Release", of
/type/datetime, to specify the date on which the quarter was released into circulation. (Use the type name "Date/Time"). -
A property named "Mintage", of
/type/int, to specify how many quarters were minted. (Use the type name "Integer".) -
A property named "Statehood", of
/type/datetimeto specify when the state gained statehood.
Be sure to make each of these properties unique by clicking the "Restrict to one value" checkbox.
Next, you need to get your data into manageable form. Extract data from the US Mint site, and arrange it in a plain text file named quarters.txt that looks like the following:
Delaware,1999-01-04,1787-12-07,774824000 Pennsylvania,1999-03-08,1787-12-12,707332000 New Jersey,1999-05-17,1787-12-18,662228000 Georgia,1999-07-19,1788-01-02,939932000 Connecticut,1999-10-12,1788-01-09,1346624000 Massachusetts,2000-01-03,1788-02-06,1163784000
Each line in this file is the data for a single quarter. Fields are separated by commas. The first field is the name of the state. The second and third fields are the release date and statehood date for that state. And the fourth field is the mintage for that state quarter.
With our type created, and the data in this format, we can now write a simple script to upload the data to freebase.com. Example 6.3 shows Python code to do this. Note that you need to insert your own Freebase username and password into the script to make it work for you.
Example 6.3. quarters.py: writing a data set to Metaweb
import metaweb # Use the metaweb module
USERNAME = 'username' # Put your Freebase username and password here
PASSWORD = 'password'
# The ID for our US State Quarter type depends on our username
TYPEID = '/user/' + USERNAME + '/default_domain/us_state_quarter'
# Make sure we can log in before we go any further
credentials = metaweb.login(USERNAME, PASSWORD)
# We will be creating multiple quarters in a single MQL query.
# We start with an empty array and add MQL writes to it in the loop below
query = []
# Open our file of quarter data
f = open("quarters.txt", "r")
# Loop through lines of the file
for line in f:
# Break each line into fields
fields = line.strip().split(',')
# This query creates a single quarter
q = {'create':'unless_exists', # Create a new object
'id':None, # And return its id
'type':["/common/topic", TYPEID], # Make it a topic and a quarter
'name': fields[0] + ' State Quarter', # The object's name
'release': fields[1], # Release date
'statehood': fields[2], # Statehood date
'mintage': int(fields[3]), # How many minted
'state': { 'name': fields[0], 'type': '/location/us_state' }} # connect to State
# Add this write to the array of writes
query.append(q);
# Close the data file
f.close()
# Now send our one big query to Metaweb and get the result
result = metaweb.write(query, credentials)
# Display the id of the Metaweb object for each state
for r in result:
print "%s %s: %s" % (r['create'], r['state'], r['id'])
As we saw in Chapter 5, the MQL write grammar allows multiple independent writes to be specified in a single query as elements of a JSON array. The mqlwrite service also allows multiple named queries to be placed in an envelope. Suppose we want to create two objects in a single call to mqlwrite. Here are two envelopes that can accomplish that:
| 2 Writes in 1 Query | 2 Queries in 1 Envelope |
|---|---|
{
"query":[{
"create":"unless_exists",
"type":"/common/topic",
"name":"my test object #1"
},{
"create":"unless_exists",
"type":"/common/topic",
"name":"my test object #2"
}]
}
|
{
"query1":{
"create":"unless_exists",
"type":"/common/topic",
"name":"my test object #3"
},
"query2":{
"create":"unless_exists",
"type":"/common/topic",
"name":"my test object #4"
}
}
|
When you include multiple writes in a single query, the writes are executed atomically: they all succeed or they all fail. As a result, they are not allowed to depend on each other, and there is no way to tell what order they are executed in.
If you submit multiple queries in a single envelope, they are not atomic. Each one succeeds or fails on its own. JSON envelopes are unordered collections of properties, and there is no guarantee that the queries will be executed in the order in which they are written. For this reason, queries that share an envelope should not depend on each other.
The fact that mqlwrite can accept multiple queries explains why an envelope object is required in the first place. It is needed so that each query has a name that can be used for query results. If multiple named queries are submitted in an envelope, then the response envelope contains multiples result. If the queries are named q1 and q2, then the responses are available as result.q1 and result.q2.
The unique feature of Metaweb is the way that it stores relationships between objects. But, like any database, it can also store large chunks of data, such as long HTML documents or binary image files. In Chapter 4 we learned about the trans service for retrieving content. Here, we'll learn how to upload content to be stored in Metaweb. The service for uploading content is named upload, and has the URL path /api/service/upload. The upload service responds only to HTTP POST requests. The data to be uploaded is passed in the body of the request. The MIME type of the content, as well as the encoding of textual content is specified in the HTTP Content-Type request header.
Example 6.4 shows the Python implementation of a metaweb.upload() utility. Pass it a string of content (Python strings can be binary data or textual data), a MIME type for the content, and the authentication credentials returned when you logged in. upload() creates a new /type/content object to hold your content and returns the guid of that object to you. This guid can be used to retrieve the content with the /api/trans/raw service (see Chapter 4).
Example 6.4. metaweb.py: uploading content to Metaweb
(The full metaweb.py file is available as Appendix B.)
import urllib2 # Higher-level URL content fetching import simplejson # JSON serialization and parsing host = 'sandbox.freebase.com' # The Metaweb host uploadservice = '/api/service/upload' # Path to upload service # Upload the specified content (and give it the specified type). # Return the guid of the /type/content object that represents it. # The returned guid can be used to retrieve the content with /api/trans/raw. def upload(content, type, credentials):
# This is the URL we POST content to
url = 'http://%s%s'%(host,uploadservice)
# Build the HTTP request
req = urllib2.Request(url, content) # URL and content to POST
req.add_header('Content-Type', type) # Content type header
req.add_header('Cookie', credentials) # Authentication header
req.add_header('X-Metaweb-Request', 'True') # Guard against XSS attacks
f = urllib2.urlopen(req) # POST the request
response = simplejson.load(f) # Parse the response
if not response['code'].startswith('/api/status/ok'):
error = response['messages'][0]
raise MQLError('%s: %s' % (error['code'], error['message']))
return response['result']['id'] # Extract and return content id
In order to demonstrate the metaweb.upload() utility of Example 6.4 let's continue with our US State Quarter example. The US Mint has images of each of the state quarters on its website. Let's upload those images to freebase.com, and make them visible through the /common/topic/image property of each quarter object. Example 6.5 shows how to do this.
In order to understand this code, you have to know that the upload service handles images specially. When you upload images, the /type/content object is also given the type /common/image with various properties filled out. Because your uploaded /type/content is co-typed as /common/image, you can link directly to image content from /common/topic/image.
Example 6.5. quarterpix.py: uploading images to Metaweb
import metaweb # Our metaweb utilities
import urllib2 # For downloading images from the mint server
USERNAME = 'username' # Put your Freebase username and password here
PASSWORD = 'password'
# The ID for our US State Quarter type depends on our username
TYPEID = '/user/' + USERNAME + '/default_domain/us_state_quarter'
# Make sure we can log in before we go any further
credentials = metaweb.login(USERNAME, PASSWORD)
# All the images files are beneath this URL
imagedir = 'http://www.usmint.gov/images/mint_programs/50sq_program/states/'
# This dictionary maps state name to image name.
images = { 'Delaware': 'DE_winner.gif',
'Pennsylvania': 'PA_winner.gif',
'New Jersey': 'NJ_winner.gif',
'Georgia': 'GA_winner.gif',
'Connecticut': 'CT_winner.gif',
'Massachusetts': 'MA_winner.gif'}
# Loop through the states
for state,filename in images.items():
# First, download the image from the Mint's website
image = urllib2.urlopen(imagedir + filename)
type = image.info()['Content-Type']
content = image.read()
# Now upload it to Metaweb
id = metaweb.upload(content, type, credentials)
# Define a write query to link the quarter object to the uploaded image
query = { 'type': TYPEID,
'state':state,
'/common/topic/image': { 'id':id, 'connect':'insert' }}
# Submit the query and get the result
result = metaweb.write(query, credentials)
# Output the result
print "%s: %s %s" %(state, result['/common/topic/image']['connect'], id)
Once you have run the code in Example 6.5, use the freebase.com client to view your state quarter topics. You'll see that there are now images on the page.
6.3.3. Uploading Documents
It is fairly easy to upload an image and make it visible to users of freebase.com. It is a little trickier to do the same for textual content. When you upload a document, a /type/content object is created for that document. This allows the content to be retrieved with /api/trans/raw, but it doesn't allow it to be viewed in any natural way in the client. To accomplish that, you must create a /common/document object to reference the content, and (optionally) a /common/topic object to reference the document. Example 6.6 shows how you can do this.
Example 6.6. uploaddoc.py: uploading HTML documents to Metaweb
import sys, re, metaweb
# Read the content of the file specified on the command line
# It must be an HTML file with a <title>
filename = sys.argv[1]
try:
f = open(filename)
doc = f.read()
f.close()
except Exception, e:
sys.exit(e)
# Search through the document for a title
try: title = re.search("(?i)<title>(.*)</title>", doc).group(1)
except: sys.exit("Document has no title")
# Log in to Metaweb. Put your own username and password here
credentials = metaweb.login('username','password')
# Upload the document content to Metaweb.
# Note that we hardcode the text/html content type.
content_id = metaweb.upload(doc, "text/html", credentials)
# Submit a MQL write query to create a /common/topic and /common/document
# for the uploaded content
result = metaweb.write({'create':'unless_exists',
'type':'/common/topic',
'id':None,
'name':title,
'article' : { 'create':'unless_exists',
'type':'/common/document',
'id':None,
'content':content_id }},
credentials)
# Tell the user what we did
print "Uploaded %s: %s\n\tcontent: %s\n\tdocument: %s %s\n\ttopic: %s %s" % (
filename, title, content_id,
result['article']['create'], result['article']['id'],
result['create'], result['id'])
This Python program expects the name of an HTML file as a command-line argument. It reads the file and determines the document title by searching for a <title> tag. It uploads the document text with the metaweb.upload() method of Example 6.4. Then it submits a MQL write query to create a /common/topic that refers to a /common/document that refers to the uploaded content. It uses the document title as the name of /common/topic object.
As discussed in Chapter 5, Metaweb does not allow objects to be deleted. The closest we can come to deleting an object is deleting all the links between that objects and others. When important links such as the object's name and type are deleted, the object effectively ceases to have any useful identity. Example 6.7 is an object unlinking utility, which attempts to unlink all objects you have created (or a subset with the specified type and/or name) in the database. For each object to be deleted, it first queries the set of types of the object. Then it queries the value of all properties defined by all of those types. It turns the results of this read query into a write query, by adding a "connect":"delete" directive to each property. Finally, it executes this write query to unlink all properties of the object.
Example 6.7 performs a number of read queries as well as writes. To do this, it uses the metaweb.read() utility defined in Example 4.3 of Chapter 4.
Example 6.7 is a relatively complex example. It demonstrates the mqlwrite service, of course, but also demonstrates low-level reads and writes of types and properties. If you can make sense of this example, you have a solid understanding of the Metaweb architecture.
Example 6.7. unlink.py: unlinking Metaweb objects
import sys, getopt # Command-line parsing
import metaweb # login, read, and write utilities
# A cache that maps type ids to arrays of property information
typecache = {}
# Primitive types other than /type/text and /type/key, which are
# handled specially
primitives = set(["/type/boolean", "/type/datetime", "/type/float",
"/type/id", "/type/int", "/type/rawstring", "/type/uri"])
# Return information about the properties of type t.
# Use a cache to avoid repeated queries
def getPropertiesOfType(t):
if (t not in typecache):
props = metaweb.read({'type':'/type/type',
'id':t,
'properties':[{'id':None,
'expected_type':None}]})
typecache[t] = props['properties']
return typecache[t]
#
# Return a MQL write query that will unlink the object with the specified id
#
def makeUnlinkQuery(id):
# Find all types of the object
types = metaweb.read({ 'id':id, 'type':[] })['type']
# Make a list of all properties of all types of the object
props = []
for t in types:
props.extend(getPropertiesOfType(t))
# We start off with queries for the well-known properties of /type/object
q = { 'id': id,
'name': [{'value':None, 'lang':None, 'optional':True}],
'type': [{'id':None, 'optional':True}],
'key': [{'value':None, 'namespace':None, 'optional':True}]}
# Add properties to this query to ask for the id or value of all
# other properties of all types. Note that the way we ask for the value
# of a property depends on the expected type of the property
for p in props:
propid = p['id']
proptype = p['expected_type']
# We now need to add propid to the query.
# The value of the property depends on its expected type.
if proptype == '/type/text':
q[propid] = [{ 'value':None, 'lang':None, 'optional':True }]
elif proptype == '/type/key':
q[propid] = [{ 'value':None, 'namespace':None, 'optional':True }]
elif proptype in primitives:
q[propid] = [{ 'value':None, 'optional':True }]
else:
q[propid] = [{ 'id': None, 'optional':True }]
# Get the result of this query
r = metaweb.read(q)
# The query is structured in such a way that the result is almost ready
# to be reused as a write query. We loop through the properties of the
# result, and for any that are non-empty, we copy them to a query object
# adding "connect":"delete" to transform it into a MQL write query
q = {}
for p,v in r.iteritems():
if p == 'id': # leave the id of the query alone
q[p] = v
continue
# if the property is not id, then the value is an array
# if the array is empty, then skip this property; don't copy to query
if len(v) == 0: continue
# otherwise, iterate through the elements of the array
# and add the connect:delete directive to each one
for elt in v: elt['connect'] = 'delete'
# and copy to the query
q[p] = v
# Return the MQL write query that we can use to unlink the object
return q
def main():
# Use the getopt module to parse the command line arguments
try:
opts, args = getopt.getopt(sys.argv[1:],
"u:p:n:t:",
["user=","password=","name=","type=","debug"])
except getopt.error, msg:
print msg
sys.exit(0)
# Now process each of the parsed arguments
username, password, name, type, debug = (None, None, None, None, None)
for o,a in opts:
if (o in ('-u', '--user')): username = a
elif (o in ('-p', '--password')): password = a
elif (o in ('-n', '--name')): name = a
elif (o in ('-t', '--type')): type = a
elif (o == '--debug'): debug = True
if username is None or password is None:
sys.exit("username and password must be specified")
# Use the username and password to login to metaweb
credentials = metaweb.login(username, password)
# Start building a query to find objects to be unlinked
# We will only attempt to unlink objects that we created ourselves.
# By default we will unlink all objects we've created
query = [{'creator': '/user/' + username,
'id': None }]
# If the name argument was specified, then we only want to unlink
# objects with the specified name
if (name is not None):
query[0]['name'] = name
# If the type argument was specified then we only want to unlink
# objects with the specified type
if (type is not None):
query[0]['type'] = type
# Run the query to get the objects we're going to delete
objects = metaweb.read(query)
# If we didn't find any, quit now
if len(objects) == 0:
sys.exit("Nothing to unlink")
# Now unlink each of the objects we found
for o in objects:
id = o['id']
print "Unlinking: %s" % id
# build the query
q = makeUnlinkQuery(id)
if (debug): print "Unlink query: " + repr(q)
# run the query
result = metaweb.write(q, credentials)
if (debug): print "Unlink result: " + repr(result)
# If we're executing this file from the command-line, run the main() method
if __name__ == "__main__": main()
[17] Specifically the page http://www.usmint.gov/mint_programs/50sq_program/index.cfm?action=schedule

