5 SQL Tricks Every Data Scientist Must Know
The world of data science is changing the way we work, communicate, and live. Data scientists are at the forefront of this change, helping unlock new insights and improve our lives. But in order to do so, they need to understand SQL—the standard language for managing data in relational databases.
- Use wildcard characters for ranges of values
If you want to find all the numbers between 1 and 10—for example—you can just type “1-10” into your query and get results instantly. You can also use “*” as a wildcard character for any value (e.g., “*a?” would return results with aaaa or aaa).
- Use LIKE and WITH to find specific patterns in strings
If you're looking for a particular pattern within a string—like if someone's name is “John Smith”—you can use LIKE to find those exact characters in your query: WHERE FIRST_NAME = 'John' AND LAST_NAME = 'Smith'.
Use the LIKE operator to find all rows that match a pattern of characters or numbers. For example, if you want to find all the customers whose names start with "J," use "Name" LIKE 'J%'.
WITH statement. The WITH statement allows you to define a temporary table for your query so that you can use the results without repeating yourself.
For example, imagine you have data on all the colleges in the US that's stored in one table called "college," and they're all represented by their state abbreviation (e.g., "WA" is Washington). Let's say you want to know what percentage of incoming freshmen at each college paid full tuition (the answer will probably be 100%). You could try writing this query:
SELECT CAST(cnty AS FLOAT) AS percent_full_tuition FROM college
But you'll get an error because CAST() only works with numbers. Instead, you can wrap up our query in a WITH statement like:
WITH college_data AS ( SELECT CAST(cnty AS FLOAT) AS percent_full_tuition FROM college )
- Sorting data
Data sorting is a key process in the data analysis process. It allows you to put your data into a specific order to easily find and analyze the information you need. One of the most basic ways to sort data is using an ORDER BY clause.
In SQL, you can use the ORDER BY clause to sort your query results according to any column or columns present in your query. To do this, simply add an ORDER BY clause that specifies how you want your query results sorted:
SELECT * FROM tbl_name ORDER BY col1 ASC;
This will sort your results by column col1 in ascending order (A-Z). If we wanted it sorted in descending order (Z-A), we would use DESC instead:
SELECT * FROM tbl_name ORDER BY col1 DESC;
The above query would return all records from table tbl_name with col1 values first, then col2 values, etc., until all records have been returned and displayed on the screen.
- Using Arrays
Each value within an array has its own unique index number that determines its position with other values within the array.
For example, if we had an array containing three values: John Doe, Jane Smith, and Joe Brown—and each value had its own index number (1, 2, 3), then our array would look like this: [John Doe] [Jane Smith] [Joe Brown].
The advantage of using arrays over traditional data structures, such as strings or objects, is that arrays allow you to store multiple values together without worrying about their order or relationship with one another.
Arrays are a common data structure in SQL. An array is a collection of items with the same type and share the same name but may have different values. Arrays are useful because they allow you to store related information together without having to create separate tables for each item or value.
You can manipulate arrays in many ways using SQL syntax, including accessing elements within an array using subscripts or using functions that operate on an entire array (like SUM()).
- Derived table
A derived table lets you pull out just the parts of your query that are interesting to you, without worrying about what else might be in there. For example, say you have a bunch of users' names and their favorite colors:
SELECT name FROM users_table WHERE age > 18;
This query would give us a whole list of users' names, but we don't really care about the rest—we just want to know what people like as their favorite color! We could do this with another query:
SELECT name FROM users_table WHERE age > 18;
SELECT favorite_color FROM users_table WHERE age > 18;
This would give us back just one column instead of two (which is why it's called a "derived" table).
Looking to get a deep and better understanding of SQL, then get in touch with us at Imarticus Learning by visiting our offline centers in major cities throughout India or through chat support. If you are a fresh graduate or have just started your career, Imarticus can offer an extensive data science certification program that ventures through each aspect necessary for data scientists.