Drupal 8: Import an Atom feed using Migrate
I needed to get events from another system into drupal 8. They were available through an atom-feed. Now, the feeds module is/was, at the time of writing, not ready for Drupal 8.
Another way of importing (external) sources is through the core module Migrate. This has, however, no UI, so it is not a point-&-click-operation.
I like to share with you how to write a (simple) migration plugin. Bare in mind though that each migration is very specific and have their own particular needs.
So what do we need?
This blog post pointed me in the right direction.
Besides core Migrate we need a couple of contrib modules to get what we want:
- Migrate Plus to get a XML-parser.
- Migrate Tools so we can use drush.
The source file
We need to know what sort of file we're dealing with exactly, what is in it and how do we get it out?
This is (a part of) the source xml:
I've removed some parts to keep it small, I left only 1 entry, but they are all the same
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | <?xml version="1.0"?> <feed xmlns="http://www.w3.org/2005/Atom" xmlns:gd="http://schemas.google.com/g/2005" xmlns:pro="http://schemas.example.com/2011"> <id>tag:example.com,2011-05-01:https://example.example.com/web/feeds/events</id> <updated>2018-05-14T17:56:33+02:00</updated> <category scheme="http://schemas.google.com/g/2005#kind" term="http://schemas.google.com/g/2005#event" /> <title type="text">Example event feed</title> <subtitle type="text">Events for example</subtitle> <generator version="1.0" uri="https://example.example.com/">example</generator> <author> <name>example</name> <email>tickets@example.org</email> </author> <entry> <pro:type value="event" /> <pro:id value="33233" /> <pro:eventGroupId value="213066" /> <category scheme="http://schemas.google.com/g/2005#kind" term="http://schemas.google.com/contact/2008#contact" /> <title>House of cards</title> <pro:subtitle>Lorem Ipsum</pro:subtitle> <link rel="self" type="application/atom+xml" href="https://example.example.com/web/feeds/events/43234455" /> <link rel="alternate" type="text/html" href="https://example.example.com/events/33233" /> <link rel="alternate" type="text/calendar" href="https://example.example.com/web/events/233223.ics" /> <link rel="related" type="text/html" href="https://example.example.com/events/listday?month=5&year=2018&day=04" /> <id>tag:example.com,2011-05-01:https://example.example.com/web/feeds/events/2343443</id> <published>2018-03-20T17:17:16+01:00</published> <updated>2018-05-03T20:03:55+02:00</updated> <category scheme="http://schemas.example.com/2011#eventType" term="dancenight" label="Dancenight" /> <pro:tags>rock, pop</pro:tags> <gd:eventStatus value="http://schemas.google.com/g/2005#event.confirmed" /> <pro:spaces> <pro:space>SUB</pro:space> </pro:spaces> <gd:where rel="http://schemas.google.com/g/2005#event" label="example"> <gd:entryLink> <entry> <title>example</title> <category scheme="http://schemas.google.com/g/2005#kind" term="http://schemas.google.com/contact/2008#contact" /> <gd:structuredPostalAddress primary="true"> <gd:street>ladida 39</gd:street> <gd:city>Sometown</gd:city> <gd:postcode>0000 AA</gd:postcode> <gd:formattedAddress>example, ladida 39, 0000 AA Sometown </gd:formattedAddress> </gd:structuredPostalAddress> <summary>example, ladida 39, 0000 AA Sometown </summary> </entry> </gd:entryLink> </gd:where> <gd:when startTime="2018-05-04T22:00:00+02:00" endTime="2018-05-05T02:00:00+02:00" /> <gd:extendedProperty name="http://schemas.example.com/2011#doorsOpen"> <gd:when startTime="2018-05-04T22:00:00+02:00" /> </gd:extendedProperty> <pro:tickets isSoldOut="false"> <pro:total>0</pro:total> <pro:remaining>0</pro:remaining> </pro:tickets> <link rel="hyperlink" href="https://www.example.net/" title="poop poop de doop" featured="false" /> <link rel="audiolink" href="http://hcmaslov.d-real.sci-nnov.ru/public/mp3/Queen/Queen%20'A%20Kind%20Of%20Magic'.mp3" featured="false" /> <link rel="videolink" href="https://www.youtube.com/watch?v=dQw4w9WgXcQ" featured="false" /> <link rel="image" href="https://example.example.com/images/7/eventpublicationitem/320078/Screen Shot 2018-05-01 at 13.18.21.png" featured="false" /> <content type="html"><p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer pulvinar nibh nec ante eleifend pulvinar. Nulla molestie vel justo ac faucibus. Duis consectetur eu ipsum at dictum. Suspendisse convallis hendrerit leo a molestie. Quisque sollicitudin felis velit, nec laoreet massa tincidunt et.</p><p>Aenean nec gravida mi, sodales hendrerit purus. Mauris gravida risus ipsum, sit amet porta tellus vehicula feugiat. Donec posuere fringilla sapien vel vestibulum. Nunc nec scelerisque ligula. Donec vitae tempus nulla, rhoncus egestas lacus. Nulla rutrum nec nulla ut ullamcorper. Ut consectetur blandit libero non eleifend. Duis ut rutrum sem. Sed dapibus lectus vel metus dictum euismod. Donec at purus vitae elit mattis consequat fringilla quis massa. Nam at velit sed lorem ultricies semper. Quisque viverra congue mi, at venenatis purus vehicula nec.</p></content> </entry> </feed> |
Since we are importing events; I made a contenttype 'event' with the following fields:
- Title (default)
- Body (basic_html)
- Image (core image field)
- Start Date (date field)
- Address (plain text)
The Title will be <title>, the body will be <content> the image must be the 'href'-attribute of the <link rel="image">, the start date is on the <gd:when>-tag and the address is inside this <gd:where>-tag, mhhh...
Also each item in our migration needs a unique ID so rollbacks and update are possible. We'll use the value of <pro:id/> for that.
A basic migration file has at least a few things: an ID and a label, a source, a process, a destination and its dependencies.
It looks something like this:
1 2 3 4 5 6 | id: events_importer //name by which to call your plugin label: 'Import Atom feed Example' source: //your incoming items process: //the processing & mapping of values destination: //the entity to save to dependencies: //other modules that are needed |
Xpath
The configuration for xml in migration.yml files works with xpaths. So in this example our 'title' is at: /feed/entry[1]/title and the 'content' at /feed/entry[1]/content. If we test these paths in this online xpath-tester would get us their values. If we try to get the ID however, using /feed/entry[1]/pro:id/@value we get nothing.
This is because of the namespacing. The id-tag is <pro:id value="33233"/>. The 'pro' is the namespace.
Namespaces
Ok, so we have namespaces. How can we get the values of those fields? Our 'normal' xpaths return null.
This page explains namespaces and how they affect xpath.
We need to add those namespaces to the source declaration our yaml file to be able to use them.
In a migration configuration this is simple, our file now looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 | id: events_importer label: 'Import Atom feed Example' source: plugin: url //the migrate plugin to use for fetching our file. data_fetcher_plugin: http //the protocol to use to get the file data_parser_plugin: xml //the type of date to parse namespaces: //declaration of namespaces found in our source file. atom: 'http://www.w3.org/2005/Atom' gd: 'http://schemas.google.com/g/2005' pro: 'http://schemas.example.com/2011' urls: 'https://us.example.com/feeds/events' //the url of our source item_selector: '/feed/entry' //the items |
As you can see I've added all the declared namespaces, including the Atom one. In our xml the namespace are declared at the top of the file with the "xmlns=" attribute.
1 | <feed xmlns="http://www.w3.org/2005/Atom" xmlns:gd="http://schemas.google.com/g/2005" xmlns:pro="http://schemas.example.com/2011"> |
Another thing to notice is the item_selector, in this case '/feed/entry'. We have many 'entry' nodes (in the example there is only one). Settings this will make the script iterate over them all.
The fields that we will add to the source are the fields on these entries.
So now we can add the fields from which we want the values, as described above. The fields key is part of our source, so the file above continues like this
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | //since we are iterating over the items, the /feed/entry[i]/ part of our xpath is 'given', so the fields are relative to that. fields: - name: guid label: 'GUID' selector: 'pro:id/@value' //because we added the namespace declaration we can use this like this. - name: title label: 'Title' selector: 'atom:title' //since I also added atom: to the namespace declaration, our title path becomes this. - name: body label: Body selector: 'atom:content' - name: from_date label: 'From Date' selector: 'gd:when/@startTime' - name: address label: 'Location Address' selector: 'gd:where/gd:entryLink/atom:entry/gd:structuredPostalAddress/gd:formattedAddress' - //the above path is rather funky, but it works. name: image label: 'Event image' selector: 'atom:link[@rel="image"]/@href' //this gets the image-link (in case others are ommitted we do not use link[8]) |
So this gives us all our values. We now need to 'convert' some of them to the right format, or pass them to an additional plugin to 'preprocess' them.
How this should be done can be described in the process 'array'. Like so:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | process: uid: //the user that created the item, it can be any existing user plugin: default_value //we use the default_value plugin to set the default default_value: 1 //in this case user 1 title: title 'body/value': body //We split the body into its subparts 'value' and 'format' to set their values 'body/format': plugin: default_value default_value: basic_html //make sure 'basic_html' exists status: //the published value, we set it to TRUE plugin: default_value default_value: 1 type: //The contenttype plugin: default_value default_value: event field_from_date: //We use the format_date plugin to transform the source date to a drupal format plugin: format_date from_format: 'Y-m-d\TH:i:sP' to_format: 'Y-m-d\TH:i:s' source: from_date field_location_address: address field_event_image: //The image is downloaded via the image_import subplugin of migrate_file plugin: image_import //It returns an image entity reference that we can set on our element source: image destination: constants/file_destination //The constant is declared in the source 'array'. More on that below. uid: '@uid' //This references the uid as set above, mind the quotes! alt: !image //The ! renders the value as a string. skip_on_missing_source: true //this is usefull so it does not fail when no image is present |
I've left out some fields that were simular to the ones I allready showed you.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 | id: events_importer label: 'Import Atom feed Example' status: true source_type: 'XML files' source: plugin: url data_fetcher_plugin: http data_parser_plugin: xml track_changes: true namespaces: atom: 'http://www.w3.org/2005/Atom' gd: 'http://schemas.google.com/g/2005' pro: 'http://schemas.example.com/2011' urls: 'https://namespace.example.com/feeds/events' item_selector: '/feed/entry' fields: - name: guid label: 'GUID' selector: 'pro:id/@value' - name: title label: 'Title' selector: 'atom:title' - name: body label: Body selector: 'atom:content' - name: from_date label: 'From Date' selector: 'gd:when/@startTime' - name: address label: 'Location Address' selector: 'gd:where/gd:entryLink/atom:entry/gd:structuredPostalAddress/gd:formattedAddress' - name: image label: 'Event image' selector: 'atom:link[@rel="image"]/@href' constants: file_destination: 'public://imports/' ids: guid: type: integer destination: plugin: entity:node default_bundle: event process: uid: plugin: default_value default_value: 1 title: title 'body/value': body 'body/format': plugin: default_value default_value: basic_html status: plugin: default_value default_value: 1 type: plugin: default_value default_value: event field_from_date: plugin: format_date from_format: 'Y-m-d\TH:i:sP' to_format: 'Y-m-d\TH:i:s' source: from_date field_location_address: address field_event_image: plugin: image_import source: image destination: constants/file_destination uid: '@uid' alt: !image skip_on_missing_source: true migration_dependencies: required: {} |
Running the import (finally!)
We are finally ready to actually run this script and import our items
(in reality you'll probably be testing this many many times before you get it right)
There are several ways to test you script. Perhaps the easiest is to import the yml file via the configuation synchronisation. Select 'import/single item/' select migration for the type and paste your yaml code into the box and click import. Then run:
1 | drush migrate:import events_importer |
Once you are sure your migration works you can leave it to cron to run it, with that same command.
Things that will go wrong:
Most likely you will get an error at some point during development. A few of the things that I found went wrong:
- I needed two patches to get around some errors (this may be fixed by now).
The first thing was:
I used this patch.1
[error] Migration failed with source plugin exception: Serialization of 'SimpleXMLElement' is not allowed.
The other one apparently has to do with drush 9 not implementing a function:
I used this patch.1
[error] Error: Call to undefined function Drupal\migrate_tools\Commands\drush_print_table() in Drupal\migrate_tools\Commands\MigrateToolsCommands->messages()
- At the point at which I added the importing of images my Migration simply failed, without displaying any errors. It just said:
If this happens you can always run1
[notice] Processed 0 items (0 created, 0 updated, 0 failed, 0 ignored) - done with 'events_importer'
to see if anything was logged.1
drush migrate-messages events_importer
In my case, the permissions for the 'public://imports/' directory were wrong and it could not be created. Once that was fixed the import ran and imported all my items, including the images! - Another thing that might happen is that your migration fails, due to an error, and is therefor unable to reset its status back to 'idle'. The status then remains 'Importing' and you will not be able to run you migration again.
you'll see an error like:
To see the status run:1
[error] Migration events_importer is busy with another operation: Importing
You'll see a list of migrations with their status and when they are last executed.1
drush migrate:status
To reset the status of one of them (to 'idle'), run:1
drush migrate-reset-status events_importer //<-- events_importer being the id of the one to reset
-
Another usefull thing to know is how to remove your configuration files.
Via
you get the shell. In it you can run php. The following snippet will remove the configuration from your system:1
drush php
This will allow you to re-install your module.1
Drupal::configFactory()->getEditable('migrate_plus.migration.events_importer')->delete();
A better (or easier) way to deal with this is to make sure your MODULE.install does this for you while uninstalling the module, in which case you need to only uninstall and re-install the module.
I used this snippet:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
<?php /** * @file * events_import install file. */ /** * Implements hook_uninstall(). */ function events_import_uninstall() { // Delete this module's migrations. $migrations = [ 'events_importer', ]; foreach ($migrations as $migration) { Drupal::configFactory()->getEditable('migrate_plus.migration.' . $migration)->delete(); } }
Sources
I have used many different sources to find out how to make this work. They may be usefull to you too.
- This got me in the right direction: https://ohthehugemanatee.org/blog/2017/06/07/stop-waiting-for-feeds-module-how-to-import-remote-feeds-in-drupal-8/
- This took a while: namespaces and how they affect xpath
- Migrate API:https://www.drupal.org/docs/8/api/migrate-api/migrate-api-overview
- Migrate Plus project page: https://www.drupal.org/project/migrate_plus
- The migrate_file module: https://www.drupal.org/project/migrate_file
- An example from Lullabot (for JSON): https://www.lullabot.com/articles/pull-content-from-a-remote-drupal-8-site-using-migrate-and-json-api
- This blog: https://evolvingweb.ca/blog/drupal-8-migration-migrating-basic-data-part-1
- Applying patches with composer: http://www.anexusit.com/blog/how-to-apply-patches-drupal-8-composer
Happy migrating!
Neem contact op