Welcome to Highload Cup 2018! Carefully read the rules and the technical task outlined below. If something is not clear, or doesn't work or you've got some ideas how to improve our system, you can always write to cups@corp.mail.ru or to our chat in Telegram. Good luck!
Contest legend
In an alternate reality, humanity decided to create and launch a global "other half" search system. Such a system is designed to reduce the number of single people in the world and contribute to the creation of strong families.
Participants of the Highload Cup 2018 competition are invited to act as an engineer who was asked to create a prototype of such a system. The prototype should, as soon as possible, send the correct responses to requests from third-party services that do something with the responses (for example, they may display it to users in shiny user interfaces). In essence, it should serve as a functional API for external hypothetical services.
1. Competition rules
To get started, %username%, launch your favourite IDE and download an archive with test data in JSON format from contest website https://highloadcup.ru. You need to first create and then deploy a productive application server that will implement the necessary Web API to this data. Pay attention that solutions must be packed as docker images. You can learn more about it by reading our article below.
You can use any web technologies that you can find or come up with. Choose your own programming language and a framework. This could be C++, Java + Tomcat, Python + Django, Ruby + RoR, GoLang, JavaScript + NodeJs, Haskell or something else you like. Also for data storage: MySQL, PostgreSQL, Redis, MongoDB, caches - all is up to you! Pay attention, we estimate not only the number of correct answers to requests, but also the speed of the server - choose carefully!
First test your solution locally on the test data. And when you are ready, build a docker image from it and upload it into the system of the competition. The corresponding commands are written on the task page. After the container is uploaded, a record will appear on the same page about the accepted solution and about putting it in a queue for preliminary shelling.
Tip 1. If the same solution is uploaded twice unchanged, then the system will not start the shelling. In each decision there should be, albeit minimal, differences from the previous ones.
Shelling scenario:
docker run
). In case of launch errors, it will be shown on the page with the shelling log.data.zip
will be available in /tmp/data
directory. It contains "production" data (approximately 10 MB of data for the preliminary and 1 GB for a full shelling). Please note that the /tmp/data
directory is read-only, so the solution must load the archive into RAM for processing. The archive itself will contain files with names like "accounts_<file_number>.json". These files content is a valid JSON data.{"accounts": [
{
"id": 10003,
"fname": "Мария",
"email": "ewheten@icloud.com",
"interests": [
"Красное вино",
"Стейк",
"Вкусно поесть"
],
"status": "свободны",
"premium": {
"start": 1533321770,
"finish": 1533321770
},
"sex": "f",
"phone": "8(985)4076805",
"likes": [
{
"ts": 1476378752,
"id": 41803
},
...
],
"birth": 870172195,
"city": "Испляндия",
"country": "Кроноштадт",
"joined": 1450137600
},
... // and many other accounts below
]
Host: accounts.com
header over HTTP/1.1 with re-used connections (keep-alive). Network losses are completely absent.Tip 2. In case of hacker attacks on the servers of the Highload Cup 2018 competition are noticed, the participant will be banned, and the results of the shelling won't be counted.
Please, note! Preliminary shelling starts automatically and is intended to test solutions at low load. For such a shelling, the results are shown in the form of graphs, but not rated. To participate in the rating, you must manually launch rating shelling, which is carried out in much more hardcore conditions. The number of rating shelling sessions is limited, 4 launches in 12 hours.
The results of rating attacks on all participants will be tabulated on the site. The best of the best will receive prizes!
2. Description of the subject area
Both in test, and in the "production" data there are records about one entity: Account
. It describes all known information about the user - his name, contacts, interests, revealed sympathy for other users. Guaranteed correctness of the data provided in accordance with the following types and restrictions.
One Account
record (Profile) contains following personal data:
Tip 3. All data is randomly generated and not related to real people, contacts or places, even if there were coincidences. The data generator code does not use third-party solutions, except for importing the random, datetime, calendar and string modules from the standard Python library.
Also in the Account
record there are fields specific to the "other half" search system:
3. Required API description
The API is the http request schemas that the server developed by the member must maintain. URLs are built according to the REST paradigm. The angle brackets indicate parts of the URL that can and will vary from request to request.
All responses from the server take into account the headers Content-Type
, Content-Length
, Connection
.
Tip 4. All examples are further nicely and neatly formatted for easier perception. Server responses do not include formatting. Cyrillic and special characters in the URL are encoded by the python function urlencode.
Data retreival requests (GET):
Retreiving list of users: /accounts/filter/
This API method is planned to be used to search for users by predefined or desired fields. For example, someone wanted to see all people of a certain age and gender who live in a particular city.
The response body is expected to have a {"accounts": [...]}
structure with users whose data correspond to the restrictions specified in the GET parameters. For each matching account entry, it is not necessary to transfer all the data known about it, but only the id
, email
and those used in the request.
20.12.2018: this request do not use interests and likes in output anymore. This is done by restrictions of server storages.
As a result, users should be sorted by descending values in the id field. The number of records to be selected is limited by the required GET parameter limit
.
The remaining GET parameters are configured as <field>_<predicate>
. For different fields, only certain filtering predicates can be used, which are listed in the table below. In this query, the action of several parameters is added, that is, first filtering one by one, then filtering the result by the second, and so on.
# | Field | Possible predicates |
---|---|---|
1 | sex | eq - matching specific gender - "m" или "f"; |
2 | domain - select all whose emails have the specified domain;lt - select all whose emails are lexicographically earlier;gt - same but lexicographically later; | |
3 | status | eq - matching specific status;neq - select all whose status not matching specified one; |
4 | fname | eq - matching specific first name;any - in any of the names listed separated by commas;null - select all those who have a first name specified (if 0) or not (if 1); |
5 | sname | eq - matching specifict last name;starts - select all whose last names begin with the transmitted prefix;null - select all those who have a last name specified (if 0) or not (if 1); |
6 | phone | code - select everyone who has a specific code on the phone (three digits in brackets);null - similar to the rest of the fields; |
7 | country | eq - everyone who lives in a particular country;null - similar; |
8 | city | eq - everyone who lives in a particular city;any - in any of the cities listed separated by commas;null - similar; |
9 | birth | lt - select anyone born before the specified date;gt - select anyone born after the specified date;year - select anyone who was born at that year |
10 | interests | contains - select everyone who has all the listed interests;any - select everyone who has any of the listed interests; |
11 | likes | contains - select everyone who likes all listed users(in the value - comma-separated ids); |
12 | premium | now - all who have a premium for the current date;null - similar |
Of course, we do not generate sets of these parameters by absolute randomness. Only certain combinations are allowed and each field has the probability of including it in the request. Combinations and probabilities are chosen for a reason - try to use this knowledge to your advantage.
Sample request and correct response to it::
GET: /accounts/filter/?status_neq=всё+сложно&birth_lt=643972596&country_eq=Индляндия&limit=5&query_id=110
{
"accounts": [
{
"email": "monnorakodehrenod@list.ru",
"country": "Индляндия",
"id": 99270,
"status": "заняты",
"birth": 581863572
},{
"email": "erwirarhadmemeddifde@yahoo.com",
"country": "Индляндия",
"id": 98881,
"status": "свободны",
"birth": 640015608
},{
"email": "rupewseor@rambler.ru",
"country": "Индляндия",
"id": 98828,
"status": "заняты",
"birth": 604256977
},{
"email": "fiotnefaersohhev@inbox.ru",
"country": "Индляндия",
"id": 98804,
"status": "свободны",
"birth": 596799123
},{
"email": "geslasereshedot@yahoo.com",
"country": "Индляндия",
"id": 98718,
"status": "свободны",
"birth": 640919302
}
]
}
Tip 5. In the case of an unknown field or an unresolved predicate, code 400 with an empty body is expected in the response. In all other cases, a 200 response is expected, even if no users were found.
Tip 6. All API requests have a technical GET parameter query_id
. The solution should simply ignore this parameter, since it does not require any action.
Splitting users into groups: /accounts/group/
This API method is planned to be used to create reports on the system operation. The fields by which the grouping is performed are transferred in the GET parameter keys
, separated by commas. They are not as numerous as in the user filtering request. There are five fields for grouping - sex, status, interests, country, city
.
Before grouping, it is necessary to perform a selection as in the previous query, but by specific values, and not by predicates. For example, if country=Алания
is specified in the GET parameters, then grouping is performed only by users from this country. Sampling can go on any field, but the value in it will be only one (for likes there will be only one id, for interests only one line, for birth and joined - there will be one number - year).
The response body is expected to have a {"groups": [...]}
structure with a list of groups. Each group must have the keys, which were grouped with the corresponding specific values. For each identified group, it is necessary to calculate how many users it has got into and to record as a result using the count
key. That is, the aggregation function for this query is only one and this is a count.
As a result, it is necessary to return not all identified groups, but only the N largest ones or the N smallest ones. The number N is given by the GET parameter limit=N
, and whether to return first large or small first - with the GET parameter order=-1
or order=1
respectively. In response, there may be groups with the same count
and this can create problems at the stage of validating responses. Please sort these groups among themselves by the values of other fields in the order specified by order
.
Please note that N does not exceed 50. Perhaps this fact will help you with optimization :)
Sample request and correct response to it:
GET: /accounts/group/?birth=1998&limit=4&order=-1&keys=country
(return 4 countries with the most users since the year of birth 1998)
{"groups": [
{"country": "Малатрис", "count": 8745},
{"country": "Алания", "count": 4390},
{"country": "Финляндия", "count": 2100},
{"country": "Гератрис", "count": 547}
]}
Tip 7. If unexpected grouping fields or unknown GET parameters appear in the request, code 400 with an empty response body is expected in the response.
Compatibility recommendations: /accounts/<id>/recommend/
This query is used to search for the "other half" by the specified user data. The request passes the id
of the user for whom those who are best compatible by status
, age
and interests
are searched. The decision should check compatibility only with the opposite sex (we are not against sexual minorities and condemn discrimination, it just happened :)). If a country or city with the keys country
and city
, respectively, is transmitted in a GET request, then you need to search only among those living in the specified location.
The response is expected code 200 and the structure {"accounts": [...]}
or code 404, if the user with the required id
is not found in the stored data. The accounts
key should be N users sorted in descending order of compatibility with the indicated id
. The number N is specified in the request with the GET parameter limit
and is no more than 20.
Compatibility is defined as a function of two users: compatibility = f (me, somebody)
. The function is built by the participants themselves, but so as to comply with the following rules:
Only the following fields should be displayed in the final list: id, email, status, fname, sname, birth, premium, interests
. If the answer contains equally compatible users (the same status
, interests
, birth
), then output them in ascending order by id
20.12.2018: this request do not use interests in output anymore. This is done by restrictions of server storages.
Sample request and correct response to it:
GET: /accounts/89528/recommend/?country=Индция&limit=8&query_id=151
(return the 8 most compatible with user id=89528
in the country "Индция")
{
"accounts": [
{
"email": "heernetletem@me.com",
"premium": {"finish": 1546029018.0, "start": 1530304218},
"status": "свободны",
"sname": "Данашевен",
"fname": "Анатолий",
"id": 35473,
"birth": 926357446
},{
"email": "teicfiwidadsuna@inbox.com",
"premium": {"finish": 1565741391.0, "start": 1534205391},
"status": "свободны",
"id": 23067,
"birth": 801100962
},{
"email": "nonihiwwahigtegodyn@inbox.com",
"premium": {"finish": 1557069862.0, "start": 1525533862},
"status": "свободны",
"sname": "Стаметаный",
"fname": "Виталий",
"id": 90883,
"birth": 773847481
}
]
}
Tip 8. If there is no user in the stored data with the passed id, then 404 code is expected with an empty response body.
Selection for similar sympathies: /accounts/<id>/suggest/
This type of query is similar to the previous one in that it is also about searching for the "other half". The user id
for which we are looking for the "other half" is sent in a similar way and the GET parameter limit
is used in the same way. Differences are in the implementation. Now we are looking for people of the same sex who like the same “likes” and offer those whom they recently liked themselves. If the country
or city
GET parameter is transmitted in the request, then you should look for "similar sympathies" only in a certain location.
Similarity between sympathies is defined as a function: similarity = f (me, account)
, which is calculated uniquely as the sum of fractions 1 / abs (my_like ['ts'] - like ['ts'])
, where my_like and like are sympathy for the same user. For a fraction, where my_like ['ts'] == like ['ts']
, replace the fraction with 1
. If there are no common likes, then users should be considered completely different from similarity = 0
. If one account has several likes on the same user with different dates, then the average of their dates is used in the formula.
The response contains a list of those who have not yet been liked by the user with the specified id
, but whom the users with the most similar likes liked. Sort by similarity descending, and between likes of one such user - by descending of the like date.
Sample request and correct response to it:
GET: /accounts/51774/suggest/?country=Испляндия&limit=6&query_id=152
{
"accounts": [
{
"email": "itwonudiahsu@yandex.ru",
"id": 94155,
"status": "заняты",
"fname": "Никита"
},{
"email": "neeficyreddohypot@ymail.com",
"id": 93449,
"status": "свободны",
"fname": "Иван"
},{
"email": "sotheralnes@inbox.ru",
"id": 89997,
"sname": "Лукетатин",
"fname": "Руслан",
"status": "заняты"
},{
"email": "kihatneselritunuwryr@ya.ru",
"id": 88119,
"sname": "Лукушутин",
"fname": "Николай",
"status": "свободны"
},{
"email": "otnideonfomedec@icloud.com",
"id": 87873,
"status": "свободны",
"sname": "Фаетавен",
"fname": "Сидор"
},{
"email": "poodreantasis@me.com",
"id": 85461,
"sname": "Даныкалан",
"fname": "Вадим",
"status": "заняты"
},
]
}
Tip 9. If there is no user in the stored data with the passed id, then 404 code is expected with an empty response body.
Data change requests (POST):
Add new user: /accounts/new/
This query simply adds a new user record to the stored data. New data is recorded in the request body in JSON format. It is assumed that the solution itself will control the uniqueness of the fields and data types.
In response, code 201 is expected with an empty JSON in the response body ({}
) if the creation of a new user was successful. In the case of incorrect data types or unknown keys, you must return the code 400 with an empty body.
Sample request and correct response to it:
POST: /accounts/new/
...
{
"sname": "Хопетачан",
"email": "orhograanenor@yahoo.com",
"country": "Голция",
"interests": [],
"birth": 736598811,
"id": 50000,
"sex": "f",
"likes": [
{"ts": 1475619112, "id": 38753},
{"ts": 1464366718, "id": 14893},
{"ts": 1510257477, "id": 37967},
{"ts": 1431722263, "id": 38933}
],
"premium": {"start": 1519661251, "finish": 1522253251},
"status": "всё сложно",
"fname": "Полина",
"joined": 1466035200
}
{}
Tip 10. In the case of incorrect data types, the presence of unknown keys or violation of uniqueness, you must return the code 400 with an empty body.
Update user data/accounts/<id>/
This request updates the data of a single user in the stored data. The request body in JSON format contains only updated fields and their values. The id
field is never contained among the updateable fields and is sent to the request URL. It is assumed that the solution itself will check the uniqueness of the updated fields and data types.
In response, code 202 is expected with an empty JSON in the response body ({}
) if the update was successful. If the record with the specified id
does not exist in the available data, then the 404 code with an empty body is expected. If the record exists, but unknown fields are transmitted in the request body or value types are incorrect, then code 400 is expected.
Sample request and correct response to it:
POST: /accounts/46133/?query_id=308
...
{
"birth": 664945551,
"city": "Санктобирск",
"email": "fywdolpa@yandex.ru",
"status": "заняты",
"country": "Алмаль"
}
{}
Tip 11. Similar to adding a new user, in the case of incorrect data types, the presence of unknown keys or violation of uniqueness, you need to return the code 400 with an empty body.
Add new likes: /accounts/likes/
This request adds a lot of new likes to many different users. This is how dating services work in life: the client application recommends suitable candidates, and users put a mark of sympathy (inspired by badoo). There are no restrictions on uniqueness in likes.
In the body of the request, the {"likes": [...]}
structure is passed, where likes
contains an array of objects with such keys:
In response, code 202 is expected with an empty JSON in the response body ({}
) if the update was successful. If the body of the request passed unknown fields or value types are incorrect, then code 400 is expected.
Sample request and correct response to it:
POST: /accounts/likes/?query_id=316
...
{"likes":[
{"likee": 3929, "ts": 1464869768, "liker": 25486},
{"likee": 13239, "ts": 1431103000, "liker": 26727},
{"likee": 2407, "ts": 1439604510, "liker": 6403},
{"likee": 26677, "ts": 1454719940, "liker": 22248},
{"likee": 22411, "ts": 1481309376, "liker": 32820},
{"likee": 9747, "ts": 1431850118, "liker": 43794},
{"likee": 43575, "ts": 1499496173, "liker": 16134},
{"likee": 29725, "ts": 1479087147, "liker": 22248}
]}
{}
Tip 12. If an unknown field or value type is passed in the request body, the code 400 with an empty body must be returned.
Tip 13. For all URLs that are not listed in the above API, a response with code 404 and an empty body is expected.
Now, dear participant, when you got acquainted with the rules for holding the Highload Cup 2018 and setting the task, it’s time to try and win!
We are Technopark Laboratory and Mail.ru Group. With all our hearts we wish you good luck!