As we public transports in 2GIS added the schedule

@
Show original


2GIS helps to be guided with the city. You open the application, you enter the street or organization name into search, you find, you rejoice. After the necessary organization is found, there is a reasonable question: how there to reach? And if we paid recently to automobile scenarios considerable attention, journey search on public transport was forgotten a little partially. I will tell how journey search was created, I will share subtleties of collecting and information processing.

From where undertook a task


We like to communicate with users. At the end of 2016 we conducted survey to find out how our users use public transport. The result turned out curious — we share.

" As often you use

?
"

as a whole, in all cities of presence of 2GIS more than a half of respondents use public transport every day. The city is larger, the more people use public transport daily. On weekdays it is most popular with inhabitants Moscow and Saint Petersburg, and in other cities people actively use public transport and during week-end.

dealt With the frequency of use, to look at time, what types public transports — favourites of citizens. " What types transports you prefer

" result quite we expect

of Rather top positions. In the large cities a preferable look transports — the subway. The second place at buses. More than a third of Muscovites use an electric train. In Novosibirsk go by minibuses, and in Saint Petersburg, in addition, love trams.

Interesting opening became that a half of respondents can be at the appointment end of the resources on foot.

the Following stage became clarification of weaknesses of 2GIS. We came to users with a question — that does not suffice to us?

" of That does not suffice

in 2GIS?
"

the Task with a choice of concrete types transports we solved by means of recently released filters where the user can specify, what look transports it wants to use. And here the question "When will arrive the bus? " remained actual for 64% of the interrogated users.

at this moment we reflected on addition of the schedule of movement public transports in 2GIS and on, how all our desires to realize.

Where to take data?


It is the first question which we faced. After all in most cases for ours products we collect data independently. When in the city the new residential district is built, the specialist in collection of information goes on the declared place and verifies all necessary details "in fields". With a new feature habitual approach would not work. To send experts to stops — a useless waste of time and forces as the schedule constantly changes. There are new carriers, routes are optimized, the winter is replaced in the spring. During collecting planned schedules and movement intervals simply would become outdated.

Yes. Unfortunately, data become outdated, and it was the second problem which has risen before us. The logical and quite obvious decision was to address to those who makes this schedule and controls — in subordinated establishments. Often constructive dialogue began only through the thorny bureaucratized road and councils, it seems: "Write to Ministry of Transport / Deptrans, and then we will talk".

Found responsible, began dialogue — half-troubles.

began Further a marathon with:

  1. Explanations that we want and why to us it.
  2. Belief that our idea is useful to inhabitants and city visitors.
  3. the Proof that 2GIS does not monetize creation of routes taking into account frequency of movement.
  4. Assurances that it is safe.

Victory? Here not.

the funniest — a technical aspect of a question, and in particular — a data transmission format. Yes, in any cities there are automated systems on maintaining the schedule and API for access to these data (gtfs or transfer to json in own format), but it waited for us far not everywhere. Somewhere simply offered parsit a site, besides for reasons of safety without providing access to bases. Somewhere were ready to send files (.xls.doc.pdf), but only once, without opportunity in due time to update information in our reference book. the First place on originality we appropriated to

photos of a leaflet of paper with the schedule of movement public transports.

A after all initially a problem seemed trivial — to obtain public data from the primary source!

Loading of data in internal system


Having got access and having loaded to itself data, we got up before one problem. It is impossible to take and load someone else's data into internal system just like that.

Why?

It is a high time for
to tell how in 2GIS basic data are stored. All internal products for collecting and information storage we develop
. The software for cartographers (who are responsible not only for the card, but also for transport) is called as Fiji — the detailed story here (if it is short, in Fiji cartographers draw the transport count, bring data on public transport, store the schedule. All collected routes are already entered into the system).

the First analysis showed that routes in our system and at suppliers differ, and in places — is cardinal. It was necessary to compare somehow own routes and routes of the supplier. It is possible to make, of course, it manually, but we decided to write an own matcher.

as an intermediate format for storage chose by

GTFS as already standard standard, plus some suppliers are able to issue the schedule in this format. For an intermediate DB at which the matcher works, chose PostgreSQL, and a matcher wrote for simplicity on Python.

of Matchit is simple on type and the name of a route did not turn out as routes very strongly disperse on names at us and at suppliers. Matchit according to names of stops it did not turn out for the same reason. As a result the matcher works according to rather difficult scheme, considering route geometry, transport type and then already names of stops and number of routes.

thus mistakes in comparison all the same is as suppliers have very large number of the directions: separately on each weekday, separately on each day off. Also there are mistakes in comparison of ring routes if they on a miscellaneous are got at the supplier and in our internal system (Fiji).

Therefore the final decision for the person — the cartographer can cancel manually schedule comparison if understands that the algorithm fulfilled incorrectly.

Algorithm

the Kernel of algorithm of search is written to
on C%2B%2B. Actually, journey search on public transport it not one algorithm, and a little. Pass search to the next stops is considered our algorithm of foot routing which, in turn, consists of two algorithms — " pixel " (by means of which we build pass on the territory without count of roads) and usual (already on the foot count to a stop).
as search algorithm of journey between stops at us use strongly modified A*<"48>" in which we added support of the accounting of the schedule. And if earlier a waiting time transports at a stop were certain "average" time for each project and each type of transport, now is considered either the exact or interval schedule.

thus in algorithm was necessary to consider many amusing nuances in data. For example, the route can have a departure time from a stop in 25 or even at 47 o'clock. From the point of view of data it means that it is the same flight that went last days, and it simply yet did not finish the work. It is also necessary to consider that flight can start going "tomorrow" and if the user looks for a route at the end of the current days, it is necessary to look and in the next day (actually if you store the schedule on days).

Separately solved a problem of how to combine data with the schedule and without schedule. As a result decided that routes without schedule still participate in search, simply have smaller weight. Thus, if the route without schedule coincides on stopping platforms with a route with the schedule, we will simply glue it in delivery and if it goes somehow in a different way, it will be separate option of journey with a smaller weight as we know nothing about a waiting time at a stop.

As works with 2GIS as online, and offline, the algorithm works both in the appendix, and at the server. In spite of the fact that these schedules are more or less static, server search at us also is used as on slow devices in case of existence the request to the server will fulfill Internet much quicker, than local search. For server search at us it is involved 8 search bekendov, located in three data-centers in Novosibirsk, Moscow and Dronten (Netherlands).

Release


Total result of addition of the schedule public transports you can estimate at 2GIS in our mobile application in
Google Play and App Store . In the web version will appear slightly later.

of Zarelizivshis, we received very much fidbeka. Having analysed negative, allocated two main reasons for complaints: We did not tell

  1. to users properly that in search of journey the schedule is now used and broke habitual scenarios of work with the appendix.
    By route search in the evening or at night at users were gone habitual routes in delivery. Kontrol of a choice of date/time of a trip at creation of a route dropped out of the visibility sphere.

    the Most part of addresses to technical support looked approximately so:

    — Zdrastvuyte, at you is journey search in public transport became inadequate, because …. / description of a concrete problem/.
    — you know, we issued the schedule in search of journey on public transport, here kontrol, you can set time for which the trip is planned.
    — is clear, many thanks!
  2. the Algorithm tried not to offer
  3. routes without schedule (or to lower them in delivery) in the presence of alternative with the schedule. Because of it in any cases delivery became less relevant.

    to our technologists should be specified in an emergency order and manually to bring the interval schedule by all remained types transports to return them to delivery, and us — to carry out additional control of algorithms of search.


What conclusions can be drawn Captain's conclusions following the results of start?

  • If plan to use someone else's data at itself in the appendix — surely think as will be to get these data. Not the fact that your expectations and reality will coincide. Consider risks.
  • If strongly change the current logic of operation of application — surely tell users about it and teach them to use new features: "That new" in stor read units.
  • Prepare technical support for splash in addresses irrespective of, how well you told about a new feature and as to use it.

of P.S. About completeness of data


managed As to agree not with all Deptransami / Ministries of Transport — the schedule while is only in Moscow, Saint Petersburg, Novosibirsk, Yekaterinburg, Krasnoyarsk, Omsk, Chelyabinsk, Krasnodar and Rostov-on-Don. To raise a transport covering the schedule in these cities, exactly as well as to add the new cities, we will be in process of data acquisition.