narphorium » Discuss

Discussions on narphorium

Filtersonly show threads also posted in:

  1.  

    Other NNDB keys

    1. The stuff you're doing with NNDB people is _awesome_. Immediately upon seeing it I thought about possible data-extractions that could be easily done from NNDB since we can auto-reconcile the data. So I went to look at what we could load, and I found that a lot of the possible properties we could load are also NNDB links. For example place of birth links to an NNDB geo object.

       Would it be possible to use the techniques you used to load NNDB people to load NNDB geo, and other NNDB types?

      1. Thanks Alex, I agree there's a lot of really great data at NNDB so if you guys can find a way to crawl it, that would be a big plus. If not, people can use the split-view to fill it in over time.

        In theory, I should be able to use the same reconciliation process to link companies, cities etc. The only problem is that they don't seem to have large alphabetical lists of those entities the same way they have for people. I see that there are some lists of companies by industry so that may be a good start.

      2. Well, I'm already looking at taking some of those lists of people, and doing loads of them. I'd also like to be able to go at the various people pages and extract pieces of information (like place and date of birth).

         Would you mind describing the process you're using for reconciliation somewhere, it'd be cool to see if that process might be applicable in different areas.

      3. The reconciliation process that I used for people was pretty basic. Since NNDB only has relatively popular people, I was able to match them with existing /people/person topics in Freebase. I only matched them if the name matched exactly and I left out any names that matched more than one Freebase topics. Because of that, I still have 3,600 people who haven't been reconciled but I have some ideas for how to automatically reconcile those topics as well, its just going to take some time. Once I have a more robust reconciliation tool I'll write a blog post about it.

        I also extracted a list of 48 sectors of NNDB companies last night and that gave me 3,100 companies. I haven't tried to reconcile any of them yet but I imagine a similar process might work.

      4. Ahh yes, the extact name match. :)

         That's why having strong identifiers (like nndb keys, and guids) is so great I guess. I'm looking forward to reading your blog post about a more robust reconcilation technique, it's definetly one of the harder problems we face.



    Discussion is posted in:

    Think this discussion also relates to something else? Cross-post it by adding a new discussion area:

  2.  

    NNDB Links for People

    1. I would like to add about 20,000 links from people to their NNDB pages. To do this I've created the NNDB Profile Page type.

      I've gone through all the people on Freebase and matched them to their NNDB page by name. In the cases where several people share the same name, I simply ignore those pages.

      Would it be appropriate to upload this data to the sandbox?

      1. It would indeed. Please go ahead.

      2. Ok, The complete set of links has been written to the sandbox. Please check out the results and let me know if I can write them to the main database.

      3. The data looks good!

        However, I lied when I said earlier, on the mailing list, that the IMDB Profile Page model was the way to go. We now have the ability to use keys into an external database as a way to generate URIs, which further provides uniqueness checking. I am working on converting our IMDb references into this form. It would be great if you could wait on the final NNDB load and model it that way; I will be happy to show you how once I figure it out myself. (-:

        1. Have you made any progress on this? I've looked at the documentation on enumerations but I can't figure out how to apply it to my model.
      4. Sounds like a great way to model these things. I look forward to learning how to use this new technique.

      5. Oh, yeah! It’s totally done and I forgot to come back here.

        First, I made namespaces (/authority/imdb/film, name, character). Then I made properties that enumerate those namespaces and attached URI templates to them.

        Check out the IMDb profile property on film. It expects Enumeration as its type. Then you have to get a little fancy and switch to the admin view. Set the Enumeration property of the property to the expected namespace. Then co-type the property as a Foreign key property and set a URI template. Set its type to URI Template and fill in both the canonical template (used to generate and recognize URIs) and the other templates (used only for recognition).

        The schema UI will support this at some point… just not yet, as it’s kind of a power-user thing.

      6. Ok, I've created the namespace, I've set up the enumeration property on the NNDB Person type and I've attached a URI template to that property.

        Then I added a sample key to the Paul Newman topic and the NNDB link shows up as expected. Unfortunately, the ID has a forward slash in it which gets escaped and breaks the link. I went back to the Foreign Key Property and explicitly disabled URI encoding but that hasn't fixed it. Any ideas on how to handle this?

        Coincidentally, tsegaran added a foreign key to a NYT page on the same Paul Newman topic and his key also contains forward slashes but he seems to have entered the weblink seperately without using a URI template.

      7. Ah, yes, the char escaping with keys and URI templates... I am in the process of converting the NYT keys to use URI templating, and I also ran into that problem.  There's a bug filed to have the UI behave properly when it encounters escaped URLs - I'll post back when there's a status update to this.

        Toby (tsegaran) actually added a key, and created a discrete weblink (it's not using URI templating).  When I implement the URI templating, I'll be removing the superfluous weblink.

        BTW, good work!

      8. This sounds like simply a bug  - you're right that the NYTimes links got added separately, and they'll probably need to be fixed. It may be too late to get a fix in for next week's release, but I'll try. For my and other's reference, this is CLI-4538 in our bug system.
      9. Ok, thanks guys. I'll watch for CLI-4538 in the release notes.
      10. Looks like everything is working fine now in the new release. Thanks for fixing this.
      11. I've uploaded a new version of the data to the sandbox. If no one has any objections, I'll add it to the main site.

        Is there a limit on how many writes I can do on the main site? Will I be able to make 19,000 writes in a day?

      12. I believe the normal limit is 10,000 writes.  I'll see if I can get your limit increased.
      13. Thanks Brian. The exact count should be 19,619 writes to the API with 2 properties being updated each time.
      14. Any luck getting my limit increased?
      15. I'll try to get it done before the sandbox refresh (Monday PM PST) so you can test against sandbox once more before going live.

      16. Shawn, your limit is now 25K, on sandbox and www.  Happy loading!
      17. Thanks for updating the limit. I ran one more test on sandbox and then uploaded them to www and exceeded my limit. I guess that's 25k combined sandbox & www so I'll upload the rest tomorrow.
      18. I tried running it again today and I exceeded my limit again about half way through. Is it 25k writes or 25k facts? This only happens on www. The sandbox was able to write the whole dataset at once.
      19. Just noticed that the "limit exceeded" error message says that my max_writes is 10,000 per day. I guess www is still using the old limit.
      20. Hmmm, we just recently moved datacenters, and in  the shuffle, your write limit which was previously upped, might have gotten reverted.  I'll check it out and get back to you...
      21. Additionally, are you adding keys to 25K topics, or something less?  Co-typing and adding the NNDB key is 2 primitives, which if you are trying to do for 25K topics, we really need to set your limit to 50K.
      22. I'm adding the NNDB Person type to the topic and adding the key, so if the limit is on the number of primitives rather than the number of writes then I would need a limit of 50k.


    Discussion is posted in:

    Think this discussion also relates to something else? Cross-post it by adding a new discussion area:

  3.  

    Embassies

    1. I need to model embassies and consulates as a side-effect of some work I'm doing on travel and tourism.  I see you already had a bit of work in that field, so I've made you a co-admin of my domain.  Would you like to work together on this?
      1. Sure I'd be happy to work on this together. I'm mostly working off of the Wikipedia lists but I'm sure there's other embassy/ambassador data out there.
      2. Yeah, google knows about heaps.  I just googled "list of embassies" and found lots.  Might have to be careful about licensing though.
      3. Sounds like a good data mob project  :)


    Discussion is posted in:

    Think this discussion also relates to something else? Cross-post it by adding a new discussion area:

  4.  

    Thanks!

    1. Thanks for all your recent contributions.  I've just made you one of the "top users" on the homepage.
      1. Thanks skud! There's more data on the way.


    Discussion is posted in:

    Think this discussion also relates to something else? Cross-post it by adding a new discussion area:

Start a New Discussion

Discussion will be posted in:

Think this discussion also relates to something else? Cross-post it by adding a new discussion area: